||||
|\/|
|/\|
||||
SIB-PAIR 0.99.9 (27 May 2003)
by
David L. Duffy
(1995-2003)
A program for elementary genetical analyses
David L. Duffy, MBBS PhD.
Queensland Institute of Medical Research,
300 Herston Road,
Herston, Queensland 4029, Australia.
Email: davidD@qimr.edu.au
CONTENTS
Program Sib-pair performs a number of simple analyses of family data that
tend to be "nonparametric" or "robust" in nature. It is modelled to some
extent on the Genetic Analysis System [Young, 1995] in terms of the command
language and types of analysis. Included are routines for:
- Imputation of genotypes where unequivocal.
- Allele frequencies in codominant genetic systems.
- Segregation ratios for a binary trait.
- Familial correlations and sibship variances for a quantitative
trait. Variance components analysis of quantitative traits.
- Haseman-Elston sib-pair regression for a quantitative (or binary) trait
using full and half-sib data, and variance components linkage
analysis for normally distributed quantitative traits.
- Fulker and Cardon interval Haseman-Elston regression for a quantitative
(or binary) trait.
- Multiple allele versions of the transmission-disequilibrium test.
- Test allelic association with a binary or quantitative trait -- Monte
Carlo simulation of null distribution.
- Single locus Affected Pedigree Method identity-by-state and
identity-by-descent linkage analysis. This includes Wards [1993] extensions
to include unaffected pedigree members.
- Writes locus and pedigree files in the formats used by the programs
APM, Arlequin, ASPEX, CRIMAP, FISHER, GAS, GDA, Genehunter, LINKAGE,
LOKI, MENDEL, MERLIN, PAP and SAGE.
An example of use
Sib-pair is command line oriented, and writes output only to the
standard output (the screen if you are using it interactively). Therefore
output can be saved to a file only in batch mode, or via another program that
can copy from the standard output, such as tee.
Bearing this in mind, here is a sample interactive session. We start
at the command line of your operating system shell:
> sib-pair
|||| SIB-PAIR: A program for simple genetic analysis
|\/| Version : Version 0.99.9 (29-Jul-2002)
|/\| Author : David L Duffy (c) 1995-2001
|||| Job run : Mon Jul 29 16:58:55 2002
>>
A double arrow command prompt appears. We read in a previously prepared
script:
>> include ex.in
The contents of ex.in are a series of Sib-pair commands:
# declare four loci
set loc a affection
set loc b quantitative
set loc m1 marker 0.0 cM
set loc m2 marker 5.1 cM
# read the pedigree data
read ped inline
ex1 1a x x m n x 1 3 1 2
ex1 1b x x f n x 1 2 3 4
ex1 2a 1a 1b m n 3.5 1 2 1 3
ex1 2b x x f n 1.1 2 2 2 3
ex1 3a 2a 2b m y 4.3 1 2 1 2
ex1 3b 2a 2b m n 2.0 2 2 2 3
ex1 3c 2a 2b f n 0.8 2 2 3 3
ex1 4a 3c 3d f y x 1 2 2 3
ex1 3d x x m n x x x x x
ex1 4b 3b 3e m y 4.7 1 2 3 4
ex1 4c 3b 3f m n 1.6 2 2 1 3
;;;;
# The four semicolons ends the in-line data
run
The "run" command actually starts the initial processing of the
pedigree.
NOTE: Imputation level 1. Imputing untyped parental
genotypes where unequivocal.
Pedigree file = inline.ped
Number of loci = 4
Locus Type Position
---------- ---- --------
a a 6
b q 7
m1 m 8-- 9
m2 m 10-- 11
Number of marker loci= 2
Bonferroni corr. 5% = 0.025321
Bonferroni corr. 1% = 0.005013
Bonferroni corr. 0.1%= 0.000500
NOTE: Creating dummy record for ex1-3e.
NOTE: Creating dummy record for ex1-3f.
NOTE: Father and mother of person ex1-4a appear to be swapped around. Reordering.
NOTE: Person ex1-3e appears as a mother and sex was unspecified.
Setting sex to female.
NOTE: Person ex1-3f appears as a mother and sex was unspecified.
Setting sex to female.
Pedigree: ex1 No. members: 13 No. founders: 6 No. sibships: 5
Total number of pedigrees = 1
Number with only 1 member = 0
Total number of sibships = 5
Total number of subjects = 13
Total subjects genotyped = 10 ( 76.9%)
Total number of genotypes = 20
Largest pedigree (members) = 13 (Pedigree ex1)
Deepest pedigree (genrtns) = 4 (Pedigree ex1)
Mean size of pedigrees = 13.0
Mean pedigree depth = 4.0
Mean size where >1 members = 13.0
Mean depth where >1 geners = 4.0
Number of imputed genotypes= 0
Number of pedigree errors = 0
Number of deleted records = 0
We obtain a few routine messages and summary statistics. The small table
of Bonferroni corrections is a reminder that Sib-pair does not usually
correct P-values for multiple tests; it is up to the user to decide what the
appropriate significance thresholds are.
Blank records are created for named but missing parents. This is
necessary, as a formal pedigree should have none or both parents present for
each person.
>> ls
a* b* m1 m2
The "ls" commands shows trait loci (with an asterisk appended),
and marker loci.
>> drop $m
>> ls
>> undelete
a* b* (m1) (m2)
The "drop" commands drops loci from the scope of subsequent
commands, while the "undelete" command returns access to all
loci.
>> describe m1
------------------------------------------------
Allele frequencies for locus "m1 "
------------------------------------------------
Allele Frequency Count Histogram
1 0.3000 6 ******
2 0.6500 13 *************
3 0.0500 1 *
Number of alleles = 3
Heterozygosity (Hu) = 0.5105
Poly. Inf. Content = 0.4064
Number persons typed = 10 ( 76.9%)
The "describe" command gives summary information about loci.
For marker loci, it tabulates simple allele counts and proportions in the
dataset. For traits, it gives familial correlations or recurrence risks.
The "tabulate" gives simpler tables:
>> tab a
a x: 2 y: 3 n: 8
>> tab a m1
------------------------------------------------
Cross-tabulation of "a " ... "m1 "
------------------------------------------------
m1
a 1/2 1/3 2/2
---------------------------------------
n 2 (.286) 1 (.143) 4 (.571)
y 3 (1.00) 0 (.000) 0 (.000)
No. complete observations = 10
LR contingency chi-square = 5.5
Degrees of freedom = 2
P-value =0.0643
There are a few other simple descriptive commands. The
"count" command gives information about families and
individuals:
>> count b>1
Count where "b > 1":
<
Pedigree Con=T Num ASPs Trios 4+
---------- ----- ----- ----- ----- -----
ex1 6 13 1 0 0
Total 6 13 1 0 0
The "select" command selects pedigrees containing a specified
number (or one or more) individuals meeting the criterion. The
"print" command is individual oriented:
>> print b>1
Print where "b > 1":
id=ex1-2b sex=f a=n b=1.1000 m1=2/2 m2=2/3
id=ex1-4c sex=m a=n b=1.6000 m1=2/2 m2=1/3
id=ex1-4b sex=m a=y b=4.7000 m1=1/2 m2=3/4
id=ex1-3a sex=m a=y b=4.3000 m1=1/2 m2=1/2
id=ex1-3b sex=m a=n b=2.0000 m1=2/2 m2=2/3
id=ex1-2a sex=m a=n b=3.5000 m1=1/2 m2=1/3
Number of matched persons = 6 out of 13 ( 46.2%)
Number of matched pedigrees = 1 out of 1 (100.0%)
If we had instead issued:
>> keep $m
>> print b>1
we would obtain:
Print where "b > 1":
id=ex1-2b sex=f m1=2/2 m2=2/3
id=ex1-4c sex=m m1=2/2 m2=1/3
...
The "associate" command gives results from family based
tests of allelic association for a trait versus all active marker
loci:
>> ass a
--------------------------------------------------
Allelic association testing for trait "a "
--------------------------------------------------
Marker Typed Allels Chi-square Asy P Emp P Iters
---------- ------ ------ ---------- ------ ------ ------
m1 10 3 1.9 0.3930 0.1575 127 AssX2-HWE .
m1 1 3 1.1 0.5978 0.5978 23 RC-TDT .
m2 10 4 0.9 0.8192 0.7692 26 AssX2-HWE .
m2 1 4 2.1 0.4471 0.4471 160 RC-TDT .
The "sibpair" command gives results from regression based
tests of linkage for a trait versus all active marker loci. The
P-values for these tests can be "empirically" estimated by
gene-dropping marker alleles under the null hypothesis of no
linkage:
>> sib b simulate
---------------------------------------------------------
Sham S+D H-E for trait "b " v. all markers
---------------------------------------------------------
Marker FSibs HSibs t-value Asy P Emp P Iters
---------- ------ ------ ---------- ------ ------ ------
m1 3 1 2.6 0.1180 0.0288 695 H-E +
m2 3 1 1.5 0.1908 0.2128 94 H-E .
Finally, we log transform the quantitative trait values and write out a
Genehunter type pedigree file, so we can further examine our data using a
multipoint program:
>> b=log(b+1)
>> keep b -- m2
>> write locus gh b.loc dummy
>> write gh b.pre dummy
The "dummy" keyword adds a dummy binary trait variable to the
Genehunter pedigree file, so that program will calculate ibds for
us. If we like, Genehunter can be run from within Sib-pair:
>> $ gh
The "$" command shells out to run another program.
When we exit from Genehunter, we will be returned to Sib-pair:
>> quit
This job took 1.8 minutes
There are a number of operations useful for manipulating pedigree data
prior to analysis. These allow you to prune ("prune")
pedigrees down to selected individuals and only those relatives needed
to connect the index people, to reduce families to unrelated cases and
controls ("case"), to break up large pedigrees into component
nuclear families ("nuclear") or into unrelated cliques
("subped"), if the pedigree file does not specify all the
connecting relatives (between different branches say). One can also
select ("select" and "unselect") particular groups
of pedigrees, and specify different values of a variable in each
group:
# Select out ASP nuclear families
>> nuclear
>> select containing exactly 2 where dementia and isnon and anytyp
# or EDAC nuclear families
>> unselect
>> select containing 1 where IgE>1000 and isnon and anytyp
>> select containing 1 where IgE<50 and isnon and anytyp
Sib-pair also can be used as a calculator, and has a few genetics
utilities, notably the "sml" and "grr" commands
which gives expected recurrence risks, ibd's and genotype
frequencies for a specified diallelic model:
>> sml 0.01 0.5 0.2 0.1
------------------------------------------------
Single Major Locus Recurrence Risk Calculation
------------------------------------------------
Frequency(A): 0.010000; Pen(AA): 0.500; Pen(AB): 0.200; Pen(BB): 0.100
Trait Prev : 0.102020; Pop AR: 2.0%; Var(Add): 0.000206; Var(Dom): 0.000004
Measure MZ Twin Sib-Sib Par-Off Second
---------- ---------- ---------- --------- ----------
Rec risk 0.104 0.103 0.103 0.103
Rel risk 1.023 1.011 1.011 1.006
Odds rat 1.025 1.012 1.012 1.006
PRR 1.020 1.010 1.010 1.005
ibd|A-A 1.000 0.502 0.500 0.251
ibd|A-U 1.000 0.500 0.500 0.250
Freq of A if Affected: 0.019898 (0.000,0.039,0.961)
Freq of A if Unaffctd: 0.008875 (0.000,0.018,0.982)
Mating Proportion Risk to offspring
---------- ----------- ------------------
UnA x UnA 0.806 0.102
Aff x UnA 0.183 0.103
Aff x Aff 0.010 0.104
Analytic Methods
Imputation of unobserved genotypes. This is performed using the
algorithm described by Lange & Goradia [1987]. Firstly, (0) A
phenoset (all possible genotypes) for one locus is generated for each
individual in the pedigree. Then, iterate by nuclear families, repeating the
next two steps until no further updates: (1) Parental genotypes
inconsistent with their offspring are removed; (2) child genotypes
inconsistent with their parents are removed. Finally, (3) If zero
genotypes remain, report an inconsistency; if one genotype remains, this
becomes the imputed genotype; if the joint spouse genotype is unambiguous,
but the specific genotype each spouse carries is ambiguous, if requested,
randomly assign a genotype to each parent. These latter genotypes are used
only for calculation of statistics for offspring of the pair, but not for the
parents themselves. A further extension is to sequentially (founders then
nonfounders) impute the remaining missing genotypes as the most likely member
of the phenoset. This is always performed to give starting values for the
MCMC methods, but these simulated genotypes will not be saved unless the
imputation level is set to "3" or "full".
Allele frequencies. These are for codominant systems only. Either a
straight allele count is used, or the contribution of each pedigree is
weighted by the number of founders it contains. Alternatively, the
imputed and observed genotypes in the founders can be counted.
Hardy-Weinberg proportions. These are tested by a Pearson
chi-square, with the P-value estimated via "gene-dropping" (Monte-Carlo,MC)
simulation. These are based on the genotypes in the founders of that
pedigree, where typed, or on the gene frequencies in the total sample where
the founder is untyped, and the structure of the pedigree.
Segregation ratios. The default analysis assumes the pedigrees are
unascertained, and gives the naive estimators. The ascertainment corrected
analysis is for nuclear families and follows Davie [1979]: p=(r-j)/(t-j),
where p is the risk, r is the total number of affected children, j, the
number of sibships containing exactly one proband, t, the number of children.
A fairly efficient approximate sampling variance is also given.
Haplotype reconstruction. This is performed on a nuclear family by
family basis, though incorporating grandparental information where available.
Initial reconstruction is performed using a simulated annealing algorithm
that maximizes a sharing based criterion based on length of runs of the same
alleles on a putative chromosome among sib pairs, parent-offspring pairs, and
grandparent-child pairs. The order of loci in the pedigree file is treated as
the linkage order, and map distance information is not used. Missing
parental haplotypes are filled in using the childrens' haplotypes in a simple
fashion. Mendelian inconsistencies are flagged in the printout.
The second algorithm is more ambitious and attempts to construct
recombination minimized haplotypes, again on a nuclear family by family
basis, and dealing more intelligently with missing data. A simulated
annealing algorithm using multiple restarts is used. Recombination events and
Mendelian errors are flagged in the output.
Admixture analysis. This refers to testing for a mixture of
specified distributions in the empirical distribution of a quantitative
trait, usually a mixture of normals, though Sib-pair also offers mixtures of
exponential and Poisson distributions. Information from the relationship
between family members is not utilised. The usual EM approach is used.
Test for normality. The Filliben correlation [Filliben 1975]
is calculated as a test for normality. This is the correlation between
the observed data and the rankits expected under the assumption of
normality. The P-value for this statistic is approximate, and is
produced using an approach modelled on that used by Royston [1993] for
the related Shapiro-Francia W':
log(1-rf) ~ N(m,s)
m:=1.0402 (log(log(n))-log(n)) - 1.99196
s:=0.788392/log(n)) + 0.31293
where n is the sample size. This performs reasonably well versus
the empirical percentiles:
| | Coverage
|
|---|
| n | 5 | 10 | 20 | 50 | 100 | 500 | 1000 | 5000
|
|---|
| Nominal P=0.05 | 0.021 | 0.044 | 0.055 | 0.057 | 0.059 | 0.052 | 0.045 | 0.033
|
|---|
| Nominal P=0.01 | 0.00 | 0.006 | 0.009 | 0.014 | 0.019 | 0.007 | 0.007 | 0.007
|
|---|
A common use of the Filliben correlation is as the criterion for
selecting an optimal transformation of the data.
The J0.02 statistic is a variance-corrected order-statistic
based skewness measure:
[(P0.02+P0.98)/2-P0.50]/[P0.75-P0.25]
[David & Johnson 1956]. A test of normality can be constructed,
using the standard error of J0.02. This is based on
interpolation of results from Monte-Carlo simulations
(SE(J0.02):=1.36/sqrt(N) cf Resek [1974]).
Sibship variance test. This is the linear model suggested by Fain
[1977] for the detection of the phenotypic effects of quantitative trait
loci. Briefly, if parental trait values are at the extreme of the population
distribution, then they will be carrying multiple increasing or decreasing
alleles at the QTLs. As a result, the trait variance among their offspring is
decreased compared to sibships whose parents have trait values close to the
population mean. This U-shaped relationship between midparent value and
sibship variance can be detected by fitting linear or quadratic curves.
Variance components analysis. This is the usual analysis of
quantitative traits assuming multivariate normality. The log-likelihood:
LL = -0.5 [log(det(S)) + (y-u)' inv(S) (y-u)]
is maximized, where S is the variance-covariance matrix for the
trait values for each phenotyped pedigree member, and y and u
are the trait values and their expected values (the overall trait mean).
The main diagonal elements of S takes the value:
VA+VQ+VD+VE,
and the off-diagonal elements:
Ri,jVA+ibdi,jVQ+Ki,jVD,
where,
Ri,j is the coefficient of relationship for the
i-jth pair of relatives
K is the coefficient of fraternity
ibd is the average ibd sharing at the marker location being
tested for linkage to a QTL.
The maximization is performed using AS319 (variable metric minimizer
with numerically estimated gradients). The phenometric models (ADE, AE,
E) are fitted to the intact entire pedigree, but the models including
VQ are fitted to sibships only (at present).
Binary trait association analysis. This is the Pearson
goodness-of-fit based test for equality of allelic gene frequencies at a
marker locus in individuals expressing or not expressing a binary trait (2xN
table). Both the nominal (ignoring relatedness of the sample) and empirical
P-values for the test are output. The empiric P-value is estimated via
gene-dropping simulation. These are based on the gene frequencies in the
total sample, and the structure of the pedigree, and are conditional on the
observed occurrence of the binary trait in the families.
Formerly, a sibship permutation based test was provided, calculating
the same chi-square statistic for members of sibships that contain at
least one affected and one unaffected typed individual, and generating a
(within-sibship) permutation P-value. This is now replaced by a more
powerful score test combining the within-sibship association and
transmission-disequilibrium tests after Knapp [1999] and Laird et al
[2000]. In this approach, the complete or partial genotype of untyped
parents is reconstructed from the genotypes of the affected and unaffected
children. The transmission of alleles from these reconstructed parents
is conditioned on the genotypes of the children used in the reconstruction.
The appropriate conditional distribution under the null hypothesis is
approximated via Monte Carlo simulation and rejection sampling.
Population genetic F statistics (FIS, FIT and
FST) are also calculated for each marker assuming affected
and unaffected individuals come from separate related demes. These are
estimated following the approach of Pons and Chaouche [1995], as
described by Excoffier [2001].
Quantitative trait association analysis. This fits an additive
(allelic means) model predicting an individual's trait value from his/her
genotype at a marker locus. The residual sum-of-squares is compared to those
obtained via a gene-dropping simulation of the pedigrees, giving an empiric
P-value.
Two-locus linkage disequilibrium estimation. This algorithm
finds informative founder matings, or informative matings where all the
grandparents are untyped, and imputes the two-locus haplotypes transmitted
to the offspring. The loci are assumed to be tightly linked, so that
four parental haplotypes are counted, as opposed to the more usual
two haplotypes from the child. Both D and D' measures
are calculated.
Homozygosity analysis. This tests for an increase in observed
homozygosity at a marker locus in individuals expressing a binary trait,
comparing this to the predicted homozygosity based on the allele frequencies
in the total sample. This may occur in the presence of allelic association
with a recessive trait locus, and/or deletional loss of heterozygosity (the
parents would be untyped for such an individual not to have been flagged as a
Mendelian inconsistency of course). A one-tailed empiric P-value is estimated
via "gene-dropping" simulation, based on the gene frequencies in the total
sample, and the structure of the pedigree.
Transmission-disequilibrium test. The original formulation of the
TDT is for a diallelic marker [Spielman et al 1993]. The TDT statistic
calculated by the program is the Pearson goodness-of-fit based test of
symmetry in the square table of transmitted versus nontransmitted alleles to
each affected child [Haberman, 1979]. Empiric P-values are produced by
randomization of the table. This global allelic form does not correct for the
(usually small) correlation in parental genotypes induced by linkage
disequilibrium (absent of course under the null hypothesis). Another allelic
test provided is the marginal allelic test suggested by Spielman and Ewens
[1996], which is probably slightly more powerful than the global symmetry
test [Kaplan et al 1997].
The genotypic TDT P-value is estimated via gene-dropping based on the
genotypes of typed ancestors of probands (where both parents of the proband
and all antecedents must be typed), and the structure of the pedigree. The
test statistic compares the observed number of each genotype transmitted with
the number expected based on the parental genotypes. Pairs of cells whose
total count is less than a given cutoff may be excluded from the analysis to
increase power. The P-value for the TDT testing each allele versus all others
in turn ("allele-by-allele") is the exact two-sided binomial probability (via
the beta distribution). When the plevel is zero, only the best
of the allele-by-allele test results are printed, and the P-value is
Bonferroni corrected for the number of alleles at that marker.
Note that the default option is to use probands for the TDT only one
parent is typed. For the diallelic marker case with unequal allele
frequencies, using one parent families does lead to biased results. The
unified transmission test available via the "assoc" command
does not suffer from this problem (see above).
The Schaid and Sommer [1993] genotypic risk ratio tests for familial
association under the assumption of Hardy-Weinberg equilibrium, or
conditional on parental genotypes, is also offered. This uses
log-linear modelling (implemented as iteratively reweighted least
squares) of a biallelic locus with both parents genotyped. The attributable
risk is also produced.
Sib-pair analysis. Identity by descent estimation is based on the
sib pair and parental genotypes when available. In the case of untyped
parents, the full-sib sharing is the sum of sharing for each possible set of
parental genotypes weighted by their likelihood based on all children in the
sibship. Half-sib sharing is estimated based only on known genotypes, whether
observed or unequivocally imputed. The effective degrees of freedom for the
t-test of the slope of the regression line is given as the number of
individuals in the sample (counted once only) who are in a sib pair where
both members are typed at trait and marker, minus two. For a sample made up
of nuclear families (no halfsibs), this will be equivalent to the SUM(sibship
size-1)-2 value used by SIBPAL 2.6, and originally suggested by Hodge
[1984]. For binary traits, the same ordinary-least-squares analysis is
performed -- the t-statistics from these results are only really
applicable to large samples, and tend to be too liberal. The quantity
regressed is not the usual Haseman-Elston squared trait difference, but
a function of the squared trait sums and differences following Sham and
Purcell [2001]. This approach is supposed to approach the power of the
variance components approach according to those authors,
and gives appropriate Type 1 error rates.
For the Fulker & Cardon methods, the expected ibd through the
interval between the two markers is estimated using the equation given in
Olson [1995]. Haseman-Elston regressions are performed at a series of points
across the interval using the ibd sharing of the two flanking markers,
and the given size of the interval. The Haldane mapping function is used.
Affected sib pair analysis. This is the original IBS based approach
described by Lange [1986a], extended to half-sibs as per Bishop and
Williamson [1990]. No correction for sibship size is made -- that is all
possible pairs are treated as independent. The usual two d.f. chi-square is
calculated, with expected counts being calculated based on the observed gene
frequencies in the total sample. The IBD based mean test is also
calculated.
Affected Pedigree Method. This uses the measures of genetic
similarity described by Weeks and Lange [1988; see also, Lange, 1986a,
1986b], Whittemore and Halpern [1994], and Ward [1993, 1995]. The expected
mean and variance for each pedigree is estimated via gene-dropping
simulation. These are based on the observed gene frequencies in the total
sample, and the structure of the pedigree. Both ibs and ibd based family
scores can be estimated. The original APM statistics, the APM statistic of
Whittemore and Halpern and the T(AB) and GPM statistics of Ward are
calculated.
Multilocus IBS sharing statistics. These are used to confirm the
pedigree structure using marker data. One approach calculates the overall
probability of sharing two alleles IBS for full and half sibs, summing over
all loci, and ignoring any linkage between markers. The second approach uses
gene dropping based on the given marker map. Two lists are generated, one by
individual, the other by relative pair.
Martingale residuals. The elegant approach of Commenges [1994] to
genetic analysis of age-at-onset is to analyse the residuals obtained from a
nonparametric or semiparametric survival analysis. Sib-pair implements the
former, calculating the martingale residuals using the Nelson-Aalen estimator
for the integrated hazard [eg Andersen et al 1993], which are then
transformed following Therneau et al [1990] to give a more symmetrical
distribution.
Monte Carlo Algorithms
Gene-dropping. The "unconditional" algorithm producing null
distributions is as follows. Repeat the following 1-3 steps a large
number of times. (1) Founder genotypes are assigned using the allele
frequencies in the observed sample, assuming panmixia and Hardy- Weinberg
equilibrium (HWE). Iterate, until all genotypes are assigned: (2) If
both parental genotypes are nonmissing, randomly assign the index a genotype
based on Mendelian autosomal inheritance (ie if parental genotypes are {1/2}
and {3/4}, a child's genotype is randomly selected from {{1/3}, {1/4}, {2/3},
{2/4}}, with each genotype having a probability of selection of 0.25). Once
complete, (3) calculate the test statistic based on the family's simulated
genotype. Following completion of the outer loop, (4) summarize the
distribution of the resulting test statistic. This procedure is used to
generate null distributions for the association Pearson chi-square.
For the share command, this also allows for recombination between
markers based on the given linkage map
The null distributions for the genotypic marginal TDT is generated using a
"conditional on founders" algorithm, that takes observed founder/ancestor
genotypes as given. Only typed nonfounders genotypes where both parents were
typed are simulated.
ibd estimation. The modification to calculate ibd
distributions gives each founder (two) unique alleles in his/her "typing
genotype". A simple gene-drop gives the null distributions for the ibs
and ibd statistics of the APM method.
I based the "conditional on observed genotypes" (gene-drop with rejection
sampling) algorithm for calculating ibd on that described by Blangero
et al [1995]. As before, each founder is assigned a typing genotype made up
of two unique alleles. Offspring are only assigned ibd-typing
genotypes that are consistent with the observed genotype at the observed
locus of interest. For example, say the observed genotypes in the parents are
100/102 and 100/102, and the typing genotypes associated with these are set
to {1/2} and {3/4} respectively. If the child is 100/102, assignment of
typing genotypes {1/3} and {2/4} will be rejected. This is equivalent to a
child's genotype being randomly selected from {{1/4}, {2/3}}, with each
genotype having a probability of selection of 0.5. The resulting ibd
statistics based on the typing genotype will approximate ibd for the
marker locus of interest.
Missing genotype simulation by Monte-Carlo Markov Chain (MCMC). The
calculation of ibd in the presence of missing genotypes is performed
via a Metropolis algorithm. This algorithm is a multiallelic extension of
that described by Lange and Matthysse [1989]. One iteration of the generation
of a legal constellation of imputed and observed genotypes is produced
by:
(1) Perform (a) (b) or (c):
(a) Simulate ibd, then "mutating" up to four imputed founder
alleles. These propagate through the pedigree using the current pattern of
ibd transmission as indicated by the ibd-typing alleles, and
are rejected and resimulated if an (unordered) inconsistency with an observed
genotype occurs.
(b) Simulate ibd, then switch the parent of origin for an
individual heterozygote. Propagate this change up through the pedigree to the
originating founder(s), but not below the chosen "pivot" individual.
(c) A "conditional on observed genotypes" dropping of ibd-typing
alleles, with the refinement that this ignores imputed genotypes.
Inconsistencies thus generated for imputed genotypes are resolved by changing
the imputed genotypes. This procedure will be slow in the presence of a large
number of untyped nonfounders.
(2) Resimulate all untyped x untyped founder mating joint genotypes
conditional on their offspring and other spouses. This is included to speed
mixing, and is not essential.
The resulting proposed constellation is accepted or rejected via the
Metropolis criterion.
MCMC burn-in. In releases prior to version 0.96.0, there was no
burn-in for this Metropolis algorithm, as preliminary empirical tests had
found the results from this program agreed well with "exact" results from
programs such as GENEHUNTER. Subsequently, I have found some pedigrees where
using the starting genotypes from the Lange-Goradia approach does lead to
biased ibd estimates for certain pairs of relatives. Therefore, the
program now performs a number of burn-in iterations (default 100) prior to
those used to estimate ibd. The required number of such iterations
depends on the number of missing genotypes in the pedigree.
Randomized TDT. The randomization test for the global allelic TDT
permutes the transmission table by randomly selecting a single proband-parent
pair and reversing the transmitted and nontransitted alleles. One "shuffle"
of the table involves N such permutations, where N is the number of such
informative parent-proband pairs in the observed pedigrees (this reduces the
correlation between successive tables in the random walk to close to
zero).
Sequential empiric P-values. The Monte-Carlo P-values provided for
the various MC-based tests are produced using the sequential approach
described by Besag and Clifford [1991]. In this refinement, we only generate
as many pseudosamples as is necessary to give a P-value numerator of size
mincount; the denominator is the number of pseudosamples. The practical
effect of this procedure is that if the true P-value is large, then
relatively few pseudosamples are generated to give a less precise estimate of
this uninteresting value. Besag and Clifford suggest a value for
mincount of 10-20. It is necessary to set a maximum denominator to avoid
excessive computation for "highly significant" results.
Other empiric P-values. An exception to this is the algorithm used
for empirical P-values for the APM. Here, a P-value for each family is
simulated at the same time as the mean and variance. These P-values were
previously combined using the procedure due to Fisher [Hedges and Olkin
1985], that is, twice the sum of the natural logarithms of the P-values was
treated as a chi-square variate with 2*N degrees of freedom, where N is the
number of contributing families. This does not seem to be particularly
powerful, so each P-value is now inverse-normal transformed to a Z-score, and
these combined in an unweighted fashion [Hedges and Olkin 1985].
The program reads commands from standard input, and writes results to
standard output. Therefore, the program can be run interactively, or if a
series of commands is to be found in a file, in batch mode. If the input file
was test.in, entering "sib-pair <test.in >test.out"
would perform the commands in test.in, and write results to test.out.
A command is a single line of keywords, locus names and/or variable
values. If a "\" character is the last word of a line, the next
line is interpreted as a continuation of the previous command. Sib-pair is
case-sensitive, so that the keyword "READ" is not equivalent to "read".
Commands are either global, which can be entered at any time; descriptive
(set impute, set locus, read pedigree), which must
precede the run statement; the run statement, that causes the
dataset to be read and processed; or analytic, which act only after the
run statement.
One command, set plevel, controls verbosity of output. Some useful
descriptive tables are only printed if plevel is at least 1.
Sib-pair's parser can evaluate simple algebraic and logical expressions
for each record in a datafile, but does not allow complex programming. For
example, multiple actions contingent on a single condition usually have to
each have their own if statement.
A peculiarity of the subsetting statements (keep, drop,
undelete) is that they do not affect algebraic and logical
expressions. Therefore, it is possible to drop a variable, but
still use it to select pedigrees, say.
Global commands
- !|#. The rest of the line is a comment, and is echoed to
standard output.
- %|$. The rest of the line (up to position 80) is a command,
and is passed to the shell for execution.
- clear. Restarts the program, closing all workfiles and zeroing all
arrays.
- help [all|globals|data|analysis]. Prints a brief description of
the commands -- either all, or a subset.
- info. Information about program settings.
- list|ls
[markers|$m|traits|$a|$q]. List of
loci in current analysis.
- show map. Shows the current marker map.
- time. Print time elapsed since start of the program.
- include <command file>. Read in Sib-pair commands from a
file.
- quit|bye. Halts the program.
- set prompt [on|off]. Displays a prompt.
- set ndecimal_points [<nwid>]
<ndec>. The total width (number of characters) of a
quantitative variable written to a new pedigree file defaults to 9 (and
is fixed to 8 for some files, notably MENDEL and FISHER) but can be set
as high as 20 for GAS and LINKAGE format files. The number of decimal
places can be set to ndec.
- set out|plevel
<level>|verbose|on|off. Increasing the
print level causes more information to be printed by almost all procedures.
Print level 1 prints out the identities and genotypes of parents
imputed where the genotype was missing, raw counts of genotypes for the
hwe procedure, expected ibs probabilities for the asp
procedure etc. Print level 2 (or verbose) writes out the
statistics for each simulated dataset for the MC based procedures, the
intrapair variance and ibd sharing for each pair in the sib pair analysis,
etc. Print level -1 omits outputting a list of pedigrees.
- set weight founders|imputed. Weights contribution of each
pedigree to the allele frequencies by the number of typed founders, or
alternatively gives the count of the founder alleles, observed and
imputed.
- set burn-in<number of iterations>. Controls the
number of Markov Chain Monte-Carlo iterations used by the apm
algorithm discarded before estimation commences. Default is 100 iterations.
Setting bur to zero means no burn-in is performed (the old
default).
- set iteration<number of iterations>. Controls the
number of iterations used by the various Monte Carlo algorithms. Default is
200 iterations. Setting ite to zero means the Monte Carlo procedures
are not performed.
- set mincount <minimum numerator of P-value>. Controls
the number used for Monte Carlo simulation of a P-value. Default is 20
pseudosamples with a test statistic more extreme than that for the observed
statistic. Set mincount equal to iter if this is not
desired.
- set seeds <seed1> <seed2><
seed3>. Initializes random number generator seeds to given values,
rather than via system time.
- set tdt bot[h parents]|one
[parent]|first. Limits TDT statistic to cases where either
both parents or at least one parent is typed, or one proband per family
where both parents typed.
- set map function kosambi|haldane. Set the mapping function used
by multipoint analytic and locus file output routines.
- pchisq <chi-square> <degrees of freedom>.
Calculate P-value for central chi-square distribution.
- proportion <numerator> <denominator>
<confidence interval width>. Calculate accurate confidence
interval following Wilson (as described by Agresti and Coull) for a
proportion.
- sml <Frequency of A allele> <Penetrance
of AA genotype> <Penetrance of AB genotype>
<Penetrance of BB genotype>. Calculates recurrence risks
and segregation ratios under a specified diallelic generalized single major
locus model.
- sml <Frequency of A allele> <Mean
for AA genotype> <Mean for AB genotype>
<Mean for BB genotype>
<standard deviation for AA genotype>.
[<AB SD> [<BB SD>]].
Calculates mean, variance components and parent-offspring regression results
under a specified diallelic generalized single major locus model.
- grr <trait prevalence> <Frequency of
A allele> <genotypic risk ratio>
[add|dom|rec]. Calculates recurrence risks
and segregation ratios under a diallelic generalized single major
locus model specified via trait prevalence, ratio of penetrances and
pattern of inheritance (codominant multiplicative, dominant or
recessive).
Algebraic operators and functions
- <value>|<locus> *|/|+|-|^
<variable>|<locus>. Arithmetic
operations combining numerical constants and/or trait values. The
result of an operation involving constants is a single constant, but an
operation involving a trait value results in nobs results (where
nobs is the number of individuals in the pedigree file.
- <value>|<locus> <|>|ge|le|
eq|==|ne|^=|and|or <variable>|<locus>. Logical
operations comparing numerical constants and/or trait values.
- if <logical expression>
then <action>
[else if <logical expression> then
<action>]...
[else <action>]. Conditional evaluation of expressions.
Note that if statements cannot be nested.
- log|sqrt|exp|sin|cos|tan|asin|acos|atan|abs|int|round
<variable>|<locus>. Functions
acting on numerical constants and/or trait values.
- rand|rnorm. Produce a random value from U(0..1) or N(0,1).
- istyp|untyp <marker>. Test if genotyped
at given marker.
- numtyp|anytyp|alltyp. Show number of markers an individual
is genotyped at, or indicate whether genotyped at any one or all marker
loci.
- male|female|isfou|isnon. Test sex and founder status of
individual.
- num|nfound. Number of members and number of founders of the
pedigree containing an individual.
- famnum|index. Position of pedigree and of individual in the
active dataset.
Data Declaration commands
- set datadirectory <pathname>. Sets directory to be
searched for pedigree files.
- set workdirectory <pathname>. Sets directory to which
temporary files are written.
- set impute off|on|<level>. Toggles
imputation routine.
- Imputation level 0 (off) does not impute genotypes.
It does generate legal genotypes for all untyped individuals, that are
used for the MCMC IBD estimation. This approach is slowed by, and can
fail (stochastically) in large pedigrees. In this case, the imputation
level can be set to -1 (completely off!).
- Imputation level 1 (on) imputes single individual's
genotypes if unambiguous. All missing genotypes are silently imputed
for the use of the MCMC IBD routine via sequential imputation and
application of the Lange-Goradia algorithm (see level 3).
- Imputation level 2 imputes the genotypes of a
spouse pair where the exact owner of each genotype is ambiguous eg both
parents unknown, children 1/2, 3/4 so parental phenosets are each
{{1/3},{2/4}}. Of limited use.
- Imputation level 3 makes available all the imputed values
for the missing genotypes (in pedigrees where at least one member is
typed at the locus). The missing genotype is replaced by the most
likely genotype conditional on the typed members of the pedigree, and
those genotypes imputed at that time -- single locus sequential
imputation. Imputation proceeds through the untyped founders,
according to the collation order of the ID, then nonfounders,
according to the collation order of parental IDs.
- set errordrop off|on|<level>. Toggles
automatic deletion of genotypes that give rise to a Mendelian inconsistency,
either an entire nuclear family (level 1), or an entire pedigree (level 2,
the default).
- set checking off|on. Toggles the first level testing
for Mendelian inconsistencies within nuclear families.
- set locus <locus name> <locus type>.
Declares position (by order within list), name and type of locus within
pedigree file. Locus type may be either:
| marker | an autosomal (fully) codominant marker
|
| xmarker | X-linked codominant marker
|
| quantitative | quantitative (or interval or ordinal) trait
|
| affection | binary trait
|
It is best to avoid a locus name containing reserved characters (eg
"+-*/()^"), if algebraic manipulation of that variable will be required
(othewise quotation of the name is required). Names identical to commands
also cause trouble unless protected by brackets.
- loci <command file>. Read in Sib-pair locus and pedigree
file declarations from a file.
- read locus linkage <locus file name. Read locus
names, types and map positions from a Linkage-format locus (.dat)
file. Does not recognise factor coding of genotypes.
- read pedigree <pedigree file name>|inline.
Reads a GAS type pedigree file either from an external file, or inline
following the command. The inline data is terminated by a line containing
";;;;".
- read linkage <pedigree file name>|inline.
Reads a LINKAGE type pedigree file.
- set sex on. Creates a quantitative dummy variable for sex (field 6
in pedigree file).
- run. Reads in pedigree file and creates working pedigree
file. Imputes genotypes if requested.
- set map <pos1>...<posN>. Set map positions for
the marker loci. This will overwrite any original map positions.
- set distances <dis1_2> <dis2_3>...<posN-1_N>.
Set interlocus map distances map positions for the marker loci. Distances
are in centiMorgans. This will overwrite any original map positions.
Analysis and data manipulation commands
- keep <loc1>...[<locB> to
<locC>]...
[$mar|$qua|$aff]...<locN>. Retain loci in
subsequent analysis. Consecutive loci can be summarized as a range, as can
all members of a particular class of locus type (marker,
quantitative, affection) via a class ($type) token. Note
that dropped variables can still be used in algebraic and logical
expressions.
- drop <loc1>...[<locB> to
<locC>]...
[$mar|$qua|$aff]...<locN>. Exclude
loci from analysis. Note that dropped variables can still be used in
algebraic and logical expressions
- undelete [<loc1>...[<locB> to
<locC>]...
[$mar|$qua|$aff]...<locN>]. Return
previously dropped loci to analysis. Default is to undelete all dropped
loci. This is not the reverse of the delete command.
- select [containing|exactly
<nprobands>] [where] <a logical
expression>. Select pedigrees containing one or more individuals
with a trait value meeting the criterion.
- select pedigree [[not] in]
<ped1>...<pedN>. Select pedigrees included or
excluded from a list of pedigree names.
- unselect. Returns all pedigrees excluded by a select command back
to the analysis.
- pack. Permanently delete all pedigrees excluded by a select
command from the work file.
- edit <pedigree> <person>|all
<trait>
to <value> [<new value>]. Allows editing
of trait values or genotypes. The all keyword performs the
action on all members of that pedigree.
- delete <pedigree> <person>|all
Sets all data to missing for a specified individual. The
all keyword performs the action on all members of that pedigree.
- delete [<locus1>...<locusN>]
[whe|where] <a logical expression>.
Sets specified data to missing for all individuals meeting particular
criteria.
- recode <marker>
<all1|value1>...<allN|valueN>
to <new allele|new value>. Allows pooling of marker
alleles prior to subsequent analysis.
- combine <marker1> [...<markerN>]
[<threshold>]. Pool rare alleles for a marker into one new
allele. "Rare" defaults to a frequency of 5%, but can be changed
via the last parameter on the command line.
- transform <xtrait> <divisor>
<subtractand> <power> <lower
threshold> <higher threshold>. This transform the
quantitative trait xtrait as:
boxcox({xtrait-subtractand}/divisor)
where boxcox() is (a slightly altered) Box-Cox transformation, so that:
- if power=0, the transformation is log({x-s}/d);
- if power=1, it is {x-s}/d;
- and otherwise [({x-s}/d)p-1]/p.
The resulting transformed value can then be truncated above or below using a
specified low or high threshold.
- standardize <trait> [familywise]. Replace each
trait value with its Z-score, ie (x-xbar)/sd, where
xbar is the total sample mean, and sd the total sample standard
deviation. This can also be performed using the individual's family mean and
standard deviation, if the fam keyword is included.
- adjust <ytrait> on <xtrait>
[to <adjustment value of xtrait|m|f>].
Performs linear regression of quantitative trait ytrait on quantitative or
binary trait xtrait (or sex, if sex is set to on), calculates
residuals, and adds adjustment value or, if not specified, the mean
value of xtrait. The residuals then replace the original values of
ytrait. A multiple regressive adjustment of Y on
X1 and X2 requires
sequential adjustment of Y on
X2,X1 on
X2, and then Y on the adjusted
X1.
- residuals <ytrait> on
<loc1>...[to]...<locN> [complete_obs].
Replace quantitative trait with the residuals from the multivariate
regression on the list of predictors (which may include the average
allele length of a marker locus). The com option means only
individuals with no missing values for any of the listed traits will be
updated. Otherwise, missing values are replaced with the sample mean
for that phenotype when calculating the predicted value.
- predict <ytrait> on
<loc1>...[to]...<locN> [complete_obs] Replace
quantitative trait with the predicted value from the multivariate
regression on the list of predictors (which may include the average
allele length of a marker locus). The com option means only
individuals with no missing values for any of the listed traits will be
updated. Otherwise, missing values are replaced with the sample mean
for that phenotype. when calculating the predicted value
- impute <ytrait> on
<loc1>...[to]...<locN> [complete_obs] Replace
missing quantitative trait values with the predicted value from the
multivariate regression on the list of predictors (which may include the
average allele length of a marker locus). The com option means
only individuals with no missing values for any of the listed predictor
traits will be updated. Otherwise, missing values are replaced with the
sample mean for that phenotype when calculating the predicted
value.
- kaplan-meier <age-at-onset> <censor>
[residuals]. Prints the product-limit estimator for the survivor
function for the quantitative trait age-at-onset, where censor
is the binary outcome trait, which is affected when
age-at-onset represents the age at which the individual first expressed
the trait. The age-at-onset is replaced by a nonparametric residual
when requested. If affected, this is:
sgn(1-H(t)).(-2(1-H(t)+log(H(t))))1/2
If unaffected:
-(-2H(t))1/2
where H(t) is the Nelson-Aalen estimate of the integrated
hazard function at that age t.
- rank <trait> <rank>. Write the ranks
of a quantitative trait to the quantitative variable rank.
- nuclear [maxsibs] [grandparents]. Split pedigrees
into component nuclear families, duplicating individuals as necessary. If
maxsibs is set, then sibships containing more than maxsibs members
are truncated. The gra option includes the grandparents as well.
- subpedigrees. Split nominal pedigrees into component true
pedigrees. Sib-pair normally can analyse a group of individuals with the same
pedigree ID, even if they are not all related. This command splits such
groups into uniquely named formal pedigrees.
- prune [<binary trait> [|<quantitative
trait> over|under <threshold>]].
Reduce pedigree to contain probands and minimum number of connecting
relatives.
- cases <locus>.
Reduce pedigree to unrelated individuals with non-missing values at the
trait i.e. the informative founders, and any informative nonfounders
who are not directly related to any individuals already selected.
- print [where] <a logical expression>.
Print trait values for individuals, with a combination of trait values
meeting the criterion.
- write [<pedigree file name>]. Writes a GAS type
pedigree file from the current dataset. Default is to screen.
- write pap. Writes the required pedigree files trip.dat
and phen.dat (note that you may have to sort
trip.dat).
-
| write | pedigree|gas
| <pedigree file name> |
|
| | arl | | [par|all]
|
| | asp|tcl | |
|
| | crimap|tcl | |
|
| | dot | |
|
| | fisher | |
|
| | gda | | [all]
|
| | linkage|gh | | [dummy]
|
| | mendel | |
|
| | phe | |
|
| | sage | |
|
Use of the keywords pedigree or gas writes a GAS type
pedigree file from the current dataset. Quantitative values are written
as F9.x or F8.4. The keyword gda writes a GDA Nexus datafile
containing all current marker genotypes for founders. If the keyword
all is added, nonfounders will be included as well, but the
"gdatype" format will not differentiate between relatives. Similarly,
arlequin writes a data file for the program Arlequin containing
haplotypes from one informative child per family, or two parents of such
a child if the par keyword is added. Only if the all
keyword is added will all genotyped individuals be printed. The keyword
linkage writes a pedigree file from the current dataset suitable
for use by the LINKAGE (and FASTLINK) programs (note that if a
quantitative trait value is zero -- that is nonmissing -- it is recoded
to 0.0001); aspex (or tcl) writes a linkage style pedigree
file but with the marker locus names as the first line, as the ASPEX
programs prefer; gh writes a linkage style pedigree file with a
dummy affection trait as the first trait and all the quantitative traits
last, with "-" for missing quantitative trait values. The
dummy option added to linkage or gh writes a dummy
affection locus as the the first trait (everybody affected). The
sage keyword writes a pedigree file from the current dataset
suitable for use by the program FSP included in the SAGE package;
mendel writes a pedigree file from the current dataset suitable
for use by the programs MENDEL or SIMWALK; fisher writes a
pedigree file from the current dataset suitable for use by the program
FISHER; phe writes the "pheno.dat" style file required by
Mapmaker-Sibs.
- write map mendel|merlin|loki
<map file name>. Writes out the map file required by
MENDEL 4.0, MERLIN or LOKI.
- write locus pap. Writes the required locus files
header.dat and popln.dat.
-
| write locus | aspex|tcl
| <locus file name> |
|
| | fisher | |
|
| | gas | |
|
| | linkage|gh | | [dummy]
|
| | loki | |
|
| | mendel | |
|
| | merlin | |
|
| | sage | |
|
| | sib-pair | |
|
Use of the keyword gas writes a GAS type locus file from the
current dataset; linkage writes a locus file from the current
dataset suitable for use by the LINKAGE (and FASTLINK) programs;
gh writes the same as linkage save that map distances are
in cM. The dummy option is used when the first trait is a dummy
trait generated by write linkage <file> dummy. The keyword
loki writes a control file for LOKI's prep program;
sage writes a locus file from the current dataset suitable for
use by the program FSP included in the SAGE package; mendel
writes a locus file from the current dataset suitable for use by the
programs MENDEL or SIMWALK; fisher writes a locus file from the
current dataset suitable for use by the program FISHER; merlin
for MERLIN; tcl or
aspex writes the tcl command file required by ASPEX programs such
as SIB_PHASE; sib-pair writes a Sib-pair style script.
- generations [<quantitative trait>]. List
founders/marry-ins and sibships by generation number for all pedigrees,
(over)writing the generation number to a quantitative trait if
requested.
- haplotypes [<binary trait>]. This (still
experimental) routine displays nuclear family (plus grandparental, where
present) haplotypes as an ASCII pedigree drawing.
- ancestors <binary trait> [|<quantitative
trait> [over|under <threshold>]]. This
prints the IDs of the ancestor (and ancestral mating) shared by the greatest
possible number of probands in a family.
- frequencies|describe [[<codominant
marker>| <binary trait>| <quantitative
trait>]...[to]...<trait>]. Print allele
frequencies for marker loci, segregation ratios for binary trait, or
means, variances, familial correlations and a sibship variance test for
a quantitative trait. Default is to describe all loci.
- count [where] <a logical expression>.
Count individuals, and sibships and pedigrees containing such
individuals, with a combination of trait values meeting the criterion.
- print [where] <a logical expression>.
Print phenotype data for individuals with a combination of trait values
meeting the criterion.
- tab <trait 1>...[<trait N>].
Print contingency table for one, two or N traits, along with contingency
chi-square, Kruskal-Wallis test or odds ratio if appropriate.
- kruskal-wallis <quantitative trait> <trait>.
Print table of means for the quantitative trait for each level of
factor, along with the Kruskal-Wallis chi-square.
- regress <ytrait> on
<x1>...[to]... <xN>. Performs linear or
logistic regression of trait ytrait on set of loci x1...xN. If an x
variable is a marker genotype, that independent variable is the
mean allele size in the genotype.
- mixture <quantitative trait> [<Number of
distributions> [normal|pooled_normal|exponential|poisson]]. Estimate
mixing proportions, means and standard deviations for a 1..5 component
mixture model describing the specified quantititative trait. The default is a
mixture of Normal (Gaussian) distributions with different means and
variances, but a common variance can alternatively be specified. Other
distributions available are the exponential and Poisson. A line-printer type
histogram is produced.
- kinship [inbreeding|pairwise]. Write the numerator
relationship matrix (matrix of coefficients of relationship) for each
pedigree in a lower triangular form or as a list of pairs (in the latter
case, the coefficient of fraternity is also printed). Alternatively, if
requested, print a list of individuals with a non-zero inbreeding
coefficient.
- ibd <codominant marker> [pairwise]. Write the
estimated mean identity-by-descent sharing at a marker for all relative pairs
in a pedigree as a lower triangular matrix or a list of pairs.
- ibs <codominant marker> [pairwise]. Write the
estimated mean identity-by-state sharing at a marker for all relative pairs
in a pedigree as a lower triangular matrix or a list of pairs.
- hwe [founders]. Prints chi-square statistic for
Hardy-Weinberg equilibrium for all marker loci. Analysis may be restricted to
founders. The mean IBS sharing for all typed matings is also calculated, and
compared to its expected value. This latter test may allow detection of
homogamy or assortment.
- cksib. Lists all sib pairs, and the mean of IBS at all
marker loci where both members of the pair are typed at the marker,
comparing this to that expected if related as specified by the pedigree
structure. The output is to the standard output.
- share [pairs]. Lists all relative pairs, and the mean of
IBS at all marker loci where both members of the pair are typed at the
marker, comparing this to that expected if related as specified by the
pedigree structure, allele frequencies and linkage map. The output is to the
standard output. The default lists individuals whose Z score measuring
deviation from expected exceeds 1.65 with any other relative. The
pairs option prints the statistic for each deviant pair, or all pairs if
output is set to verbose.
- davie <binary trait> [<proband
indicator>]. Print segregation ratios and standard errors for a binary
trait, adjusted for the ascertainment scheme. Probands are indicated by being
affected at the proband indicator "locus". If no proband indicator
variable is given, complete ascertainment is assumed (equivalent to "davie
trait trait").
- varcomp <quantitative trait> [ae].
Performs MVN variance components analysis for a quantitative trait using
all phenotyped individuals. Currently fits ADE, AE and E models. The
ae option fits AE and E only.
- dis [[<marker locus 1>] <marker locus
2>]. Estimates frequencies of two locus haplotypes based on
individuals with two informative parents. The loci are assumed to be tightly
linked, so that four parental haplotypes may be inferred. If no markers are
specified, the sequence of pairs of markers in the list of loci is produced
(ie marker1 with marker2, marker2 with marker3...). If one marker is
specified, then the pairing of all other markers with this index marker is
analysed.
- assoc <binary trait> [|<quantitative
trait> [over|under <threshold>]]
[founders] [genotypic]. For a binary or dichotomised
quantitative trait, prints chi-square statistics for association for
all marker loci versus the trait, either affected versus
unaffected if the trait is binary, or above or below the
threshold if the trait is quantitative. A second table in the output
shows the results from the reconstruction-combined TDT within
informative sibships. Also prints F statistics assuming trait is marker
for different subpopulations. For a quantitative trait, prints the model
and residual sums-of-squares and allelic means with naive standard
errors from an additive allelic ANOVA model. Monte-Carlo empiric
P-values are produced for either analysis. Analysis may be restricted
to founders. Genotypic rather than allelic analyses can also be specified,
using the genotype flag.
- homoz [<binary trait> [|<quantitative
trait> over|under <threshold>]]. Prints the
asymptotic Z statistic and a one-sided MC P-value for whether homozygosity at
all marker loci is increased in probands, either affected if the trait is
binary, or above or below the given threshold if the trait is
quantitative.
- tdt <binary trait> [|<quantitative
trait> over|under <threshold>]
[cutoff <cutoff>] [mat|pat]. Prints
transmission-disequilibrium statistics for all marker loci versus the
trait, where an index person is either affected with a binary
trait, or whose value for a quantitative trait exceeds the given threshold.
Since binary traits are coded internally as 2=y and 1=n, an
analysis using unaffecteds as proband can be performed as tdt
<binary trait> under 2. Similarly, in unascertained
families, tdt <binary trait> over 0 tests for
segregation distortion. Calculation of the TDT statistic can be
restricted to pairs of cells whose total is greater than cutoff,
eg 5, and to the maternal or paternal contributions, if parent-of-origin
effects are suspected.
- schaid <binary trait> <marker>
[<allele>]. Performs the Schaid and Sommer [1993]
genotypic risk ratio test for familial association under the assumption
of Hardy-Weinberg equilibrium. Only one allele (defaulting to the
commonest) is tested versus all others, and two penetrance ratios
(GRR2=f2/f0 and
GRR1=f1/f0) are estimated, along with
the LR chi-square test that GRR2=GRR1=1.
- asp <binary trait> [|<quantitative
trait> over|under <threshold>]. Prints
IBS-based affected (full and half) sib-pair statistics for all marker
loci versus the trait. It also prints the mean IBD sharing for full
sibs, along with the exact (binomial) two-tailed P-value for the "mean" test.
All possible sib-pairs are used, and are treated as independent.
- apm <binary trait> [|<quantitative
trait> over|under <threshold>]
[ibd|ibs]. Prints APM statistics for all marker loci
versus the trait.
- sibpair|he1|he2|vis <quantitative
trait>| <binary trait> [<Weight
variable>] [sim] [mean<trait mean>]
[var | sd <trait variance or SD>]
[cor<trait sibling correlation>].
Performs Haseman-Elston regressions (Sham & Purcell [2000] as the
default, but Visscher-Hopper [Visscher & Hopper 2001],
traditional and "new" Haseman-Elston also available)
for all marker loci versus the trait using full and half-sib
relative pairs. The contribution of each pair can be weighted by the
mean of their values at a quantitative trait. Empirical P-values can be
simulated, if requested. For the S+P regression, the "true"
population trait mean, variance and sibling correlation can be
specified, to facilitate analysis of selected samples.
- twopair <quantitative trait>| <binary
trait> <marker locus 1> <marker locus 2>
<theta12>. Performs Fulker & Cardon's Haseman-Elston
interval regression for first and second marker loci versus the trait using
full- and half-sib relative pairs. The recombination distance between the
markers theta12 must be given. Haseman-Elston regression is performed using
ibd estimated at ten points in the interval.
- qtlpair <quantitative trait>|
Performs variance components linkage analysis for all marker loci versus
the trait using full-sib relative pairs. Both the polygenic background
and the QTL are modelled as additive genetic.
- linkage [<marker locus 1> [<marker locus
2>]]. Performs Elston and Keats sib pair linkage analysis for
codominant markers. Default is adjacent pairs of markers (ie marker 1
with marker 2, marker 2 with marker 3...). If one marker is named, then
gives estimate of recombination distance to all other markers.
The following script performs a number of analyses on a dataset containing
four loci.
Test.in
set work c:\tmp\
set weight founders
set out verbose
set impute on
set locus quant quantitative
set locus trait affection
set locus marker1 marker
set locus marker2 nam
read pedigree test.ped
run
freq
mix quant 2
assoc trait
tdt trait
ass quant
sibpair quant
recode marker1 126 128 to 999
freq marker1
tdt trait
apm trait
sibpair quant
! create a new trait
set locus new_quant qua
if (quant le 0) then new_quant= -sqrt(-2*quant) else new_quant=log(quant)
! adjust the binned allele sizes of marker
if (marker1 ne 999) then marker1=marker1+1
drop trait quant
write pedigree testout.ped
The data set contains one record (newline character delimited) per
individual. Records must be sorted into pedigrees. Records take the format
used by GAS:
pedigree-id person-id father-id mother-id sex-of-person
locus-value-1...locus-value- N
A pedigree ID may be up to 10 alphanumeric characters, and an individual's
personal ID up to 8 characters. Missing values are denoted x (or
.), and represented internally as a trait value of -9999. Locus values
for a binary trait are y (expresses trait), n (does not express
trait). Sex takes the values m (male) and f (female), and may
be missing. Alleles at a marker locus are integers between 1 and 999.
A pedigree file may contain a comment at any time, prefaced by ! or #, and
may contain a locus header of the form (though this has no function and is
included to allow compatibility with the GAS pedigree format):
pedigree locus
<locus-name-1>...<locus-name-N>.
If only one parent of an individual is specified in the pedigree file, a
dummy record and ID number for the other parent is generated by the
program.
Here is the data set analysed by the script test.in:
Test.ped
! test pedigrees including one halfsib in pedigree 1000
! Marker 1 Marker 2
! The seven mating types: 1000-1 x 1000-2 Type VII Type III
! 1000-1 x 1000-3 Type VI Type II
! 1001-1 x 1001-2 Type IV Type V
!
1000 1 x x m 10 y 126 132 1 1
1000 2 x x f 10 n 128 130 1 2
1000 3 x x f 25 n 128 132 2 2
1000 4 1 2 f 20 y 126 128 1 1
1000 5 1 2 m 30 y 130 132 1 1
1000 6 1 2 m 40 n 128 132 1 2
1000 7 1 2 f 50 n 126 130 1 2
1000 8 1 3 f 60 n 126 128 1 2
1000 9 1 3 m 40 y 132 132 1 2
1001 1 x x m 20 y 124 124 1 2
1001 2 x x f 30 n 126 128 1 2
1001 3 1 2 f 40 n 124 128 1 1
1001 4 1 2 m 30 n 124 126 1 2
1001 5 1 2 m 40 n 124 126 2 2
1001 6 1 2 m 40 n 124 128 1 2
1002 1 x x m 10 y x x x x
1002 2 x x f 40 n x x x x
1002 3 1 2 m 30 n 126 126 1 2
1002 4 1 2 m 60 n 126 126 1 1
1003 1 x x m 20 n 126 126 1 2
1003 2 x x f 25 n x x x x
1003 3 1 2 m 40 n 126 126 1 2
1003 4 1 2 m 15 y 126 126 1 2
! end-of-pedigrees
The pedigree file written by Sib-pair contains the original records plus
any additional dummy records for missing parents of nonfounders. It is sorted
by founder/nonfounder status, generation number, and the collation order of
the parental IDs and individual ID.
How do I?
- Get more output: Set the print level higher.
#
# Get full TDT tables but summary results for association analysis
#
set ple 1
tdt trait
set ple 0
ass trait
- Analyse only selected traits: Use the
drop then
undelete commands to specify loci or ranges of loci to be included or
exclude from particular analyses:
# Drop out year of birth and don't show any allele frequencies
drop yob $m
describe
undelete
- Write out only the markers: Use keep $m, then write
pedigree. Note that this will undelete previously deleted markers.
- Delete marker data for particular individuals: Use keep $m,
then delete <pedigree> <person>, followed by
und.
...
set loc errprob qua
keep $m
delete where errprob>0.9
und
- Use the new parser: This hopefully reduces the
number of steps in data manipulation required for a complete analysis.
#
# Create a new variable that is a function of
# three existing quantitative variables
#
set loc b1 aff
set loc q1 qua
set loc q2 qua
set loc q3 qua
read ped ex.ped
run
set loc new_var qua
new_var=log(q1+q2)/q3^2
if (male) then new_var=new_var+10
#
# Select a subset of pedigrees where two or more probands
# meeting multiple criteria
#
select containing 2 where new_var>35 and q1 le q2 and isnon
write newped.ped
- Use variable names in formulae when the variable name shares
the first three letters with a command eg "trait" and
"transform": surround the variable name with brackets.
- Analyse multiple pedigree files: Jobs can be stacked, providing
each begins with the clear command. The loci will have to be declared
each time however.
- Delete a marker that is giving too many mendelian
inconsistencies: Use the drop command on that marker before the
run command.
- Ignore error messages from a marker that is giving too many mendelian
inconsistencies: Drop the marker out before error checking as before,
then use the undelete command before the hopefully robust type of
analysis chosen. Alternatively set checking off and set impute
-1 will turn off checking for all markers.
- Test for segregation distortion: Do a TDT with everyone
affected, for example, set sex on then tdt sex over -1.
- Log transform a quantitative trait: Use tra trait 1 0
0 or trait=log(trait) (slower).
- Get a histogram for a quantitative trait: Use the
hist command (this is a synonym for mixture trait 1).
Setting the print level higher for mixture also prints out
posterior probabilities of membership of the different distributions --
useful for choosing thresholds.
- Print genotype frequencies: Set plevel to 1 so that the
full table is printed when the hwe command is used.
- Remove unrelated individuals or nuclear families that have become
disconnected from the main pedigree, although they have the same pedigree
ID: Use the subped command, followed by select $n gt 20,
or some other suitable number that will keep only the main pedigree.
- Do a multipoint ASP linkage analysis: Write the appropriate
format pedigree and locus file, and call another program like SIB_PHASE:
# Write a locus file
write locus aspex batch.tcl
# Write out the pedigrees as nuclear families (if multigenerational)
nuclear
write aspex batch.ped
# The resulting output will be included in the Sib-pair output
$ sib_phase -f batch.tcl batch.ped
$ rm batch.tcl batch.ped
- Which TDT result should I trust? The genotypic TDT is currently
a more experimental test. In the case of nuclear families with typed parents,
it reduces to a simulation based CPG GRR test. In larger pedigrees with
multiple generations of affecteds, matings must have both parents
genotyped before they are included in the simulation, but this is done
over the complete pedigree.
- What does the "founder" option for the allele frequencies give?
Since a simple count of alleles in typed founders can miss alleles segregating
in the pedigree (and thus inherited from untyped founders), I have provided
a method that enumerates all alleles in each pedigree, but weights the
contribution of the pedigree by the number of founders it carries (that is,
the number of representatives from the population whose allele frequencies one
is trying to estimate). So, in a simple example,
1 1 x x 1 1 x x
| | | |
+--+--+ +--+--+
| |
+-----+-----+ |
1 2 1 2 1 2 1 3
the contributions from each family would be weighed equally, as each
contains two founders. For allele 1 for example,
Family 1 Family 2
2 * 5/8 + 2 * 3/4
-------------------- = 11/16
4 (total founders)
| | Allele 1 | Allele 2 | Allele 3
|
|---|
| The naive estimate is: | 8/12 (.67) | 3/12 (.25) | 1/12 (.08)
|
| The weighted estimate: | 11/16 (.69) | 3/16 (.19) | 2/16 (.13)
|
| The imputation estimate: | 6/8 (.75) | 1/8 (.12) | 1/8 (.13)
|
| The MLEs (MENDEL USERM13): | .6254 | .2182 | .1564
|
In this example, the frequencies of the 2 and 3 alleles are better
estimated. In the imputation estimation approach, the untyped parents were
imputed as 1/2 and 1/3 respectively.
- What do I do if I have covariates or liability classes?
Eventually the various binary trait analyses Sib-pair does will allow for
covariates. At the moment, the best thing you can do is create multiple
phenotypes eg male diabetes and female diabetes, with the phenotype set
to missing appropriately. Then one can do the analysis within each
stratum, and pool test statistics in various ways used in meta-analysis.
In the case of quantitative traits, analysis of residuals formed by
adjusting for covariates will carry you a long way.
Regression (multiple) is performed by AS (Applied Statistics) 164,
which uses modified Givens rotations to perform weighted least squares
regression including linear constraints. It is also used to give
generalised linear models (poisson and binomial regression) via IRLS.
The random number generator is the well known AS 183. The approximate
randomization routine is styled after general templates described by
Noreen [1989]. Mixtures of distributions are fitted using AS 203.
Various standard distributions are evaluated using AS 3, 66, 111.
Sib-pair currently is limited to a maximum of 600 (or 1000 or 2000)
individuals per pedigree, 120 (or 1000) locus values or fields (eg 60
codominant genotypes, 120 binary traits etc), and 40 (or 60 or 100)
possible alleles per codominant locus. The "low memory" version of the
program stores phenoset for each individual in a pedigree is stored in a
direct access file, with resulting overheads. Write to me if this sounds
of use. The standard version performs imputation in memory.
The Monte-Carlo
based routines are computationally intensive. Generally speaking 200-300
iterations of such a routine are sufficient to give a good estimate of a
mean or variance (as in the apm routine), but 1000 iterations or more
are advised for an accurate P-value. Using "set iter 0" will provide
only the parametric estimators, e.g. for screening purposes.
Sib-pair does not know about mitochondrial DNA.
There are only a few multipoint procedures (twopoint and
haplotype).
The program is (fairly) standard Fortran 77, and runs successfully on
PCs (under DOS, NT or Linux), SUN Sparcstations, DEC Alpha and HP9000
workstations. The current DOS version is compiled via f2c and the DJGPP
port of the GNU C compiler, and is compressed using djp, an LZ0
executable compressor.
ACKNOWLEDGEMENTS
This program was developed while the author was an Australian
National Health and Medical Research Council Neil Hamilton Fairley
Postdoctoral Fellow and later a Research Fellow. This included a period
working in the Genetic Epidemiology Division of the Johns Hopkins
University School of Public Health and in the Epidemiology Unit at the
Queensland Institute of Medical Research.
- Andersen PK, Borgan O, Gill RD, Keiding N (1993): Statistical models
based on counting processes. New York: Springer Verlag.
- Besag J, Clifford P (1991): Sequential Monte Carlo p-values.
Biometrika 78: 301-304.
- Bishop DT, Williamson JA (1990): The power of identity-by-state methods
for linkage analysis. Am J Human Genet 46: 254-265.
- Blangero J, Samollow PB, Rocha MB, Hixson JE, Rogers J (1995): The IGF1
locus is a major determinant of serum osteocalcin levels in Mexican
Americans. Fourth Annual Meeting of the International Genetic Epidemiology
Society, Snowbird, Utah, June 20- 22, 1995.
- David F, Johnson NL (1956): Some tests of significance with ordered
variables. J R Statist Soc B 18: 1-20.
- Davie AM (1979): The 'singles' method for segregation analysis under
incomplete ascertainment. Ann Hum Genet 42: 507-10.
- Davis S, Schroeder M, Goldin LR, Weeks DE (1996): Nonparametric
simulation-based statistics for detecting linkage in general pedigrees. Am
J Hum Genet 58: 867-80.
- Excoffier L (2001): Analysis of population subdivision. In:
Balding DJ et al. Handbook of Statistical Genetics. London: Wiley
and Sons. 271-307.
- Fain PR (1977): Characteristics of simple sibship variance tests for the
detection of major loci and application to height, weight and spatial
performance. Am J Hum Genet 42: 109-20.
- Filliben J (1975): The probability plot correlation coefficient test
for normality. Technometrics 17: 111-117.
- Haberman SJ (1979): Analysis of quantitative data. Volume 2. New
developments. New York: Academic Press.
- Haseman JK, Elston RC (1972): The investigation of linkage between a
quantitative trait and a marker locus. Behav Genet 2:
3-19.
- Hedges LV, Olkin I (1985): Statistical methods for meta- analysis. San
Diego: Academic Press.
- Kaplan NL, Martin ER, Weir BS (1997): Power studies for the
transmission/disequilibrium tests with multiple alleles. Am J Human
Genet 60: 691-702.
- Keats BJ, Elston RC (1986): Determination of the order of loci on
the short arm of chromosome 11 using two and three locus linkage
analyses of pedigree and sib pair data. Genet Epidemiol Suppl 1:147-52.
- Knapp M, Wassmer G, Baur MP (1995): The relative efficiency of the
Hardy-Weinberg Equilibrium-likelihood and the Conditional on Parental
Genotype-likelihood methods for candidate-gene association studies. Am J
Hum Genet 57: 1476-1485.
- Knapp M (1999): The transmission/disequilibrium test and
parental-genotype reconstruction: the reconstruction-combined
transmission/disequilibrium test. Am J Hum Genet 64: 861-870.
- Kruglyak L, Daly MJ, Reeve-Daly MP, and Lander ES (1996): Parametric and
nonparametric linkage analysis: a unified multipoint approach. Am J Hum
Genet 58: 1347-1363.
- Laird N, Horvath S, and Xu X (2000): Implementing a unified
approach to family based tests of association. Genetic Epi
19(Suppl 1): S36-S42.
- Lange K (1986a): The affected sib-pair method using identity by state
relations. Am J Hum Genet 39: 148-150.
- Lange K (1986b): A test statistic for the affected-sib-set method. Ann
Hum Genet 50: 283-290.
- Lange K, Goradia T (1987): An algorithm for automatic genotype
elimination. Am J Hum Genet 40: 250-256.
- Lange K, Matthysse S (1989): Simulation of pedigree genotypes by random
walks. Am J Hum Genet 45: 959-970.
- Lange K (1997): Mathematical and statistical methods for genetic
analysis. New York: Springer-Verlag.
- Noreen EW (1989): Computer-intensive methods for testing hypotheses: an
introduction. New York: Wiley.
- Olson J (1995): Robust multipoint linkage analysis. An extension of the
Haseman-Elston approach. Genet Epidemiol 12: 177-194.
- Olson J, Rao S, Jacobs K, Elston RC (1998): Linkage of chromosome 1
markers to alcoholism-related phenotypes by sib-pair linkage analysis of
principal components. Genetic Analysis Workshop 11. September 8-10, Arcachon,
France.
- Pons O, Chaouche K (1995): Estimation, variance and optimal sampling of
gene diversity. II. Diploid locus. Theor Appl Genetics 90:
122-130.
- Resek RW (1974): Alternative tests of skewness: Efficiency comparisons
under realistic alternative hypothesis. Proc Bus Econ Statist Section Am
Statist Assoc 1974: 546-551.
- Royston P (1993): A pocket-calculator algorithm for the
Shapiro-Francia test for non-normality: An application to medicine.
Statist Med 12: 181-184.
- Schaid DJ, Sommer SS (1993): Genotype relative risks: methods for design
and analysis of candidate-gene association studies. Am J Hum Genet
53: 1114-1126.
- Sham PC, Purcell S (2001): Equivalence between Haseman-Elston and
Variance-Components Linkage Analyses for Sib Pairs. Am J Hum Genet
68:1527-1532.
- Spielman RS, McGinnis RE, Ewens WJ (1993): Transmission test for linkage
disequilibrium: the insulin gene region and insulin-dependent diabetes
mellitus. Am J Hum Genet 52: 506-516.
- Spielman RS, Ewens WJ (1996): The TDT and other family-based tests for
linkage disequilibrium and association [editorial]. Am J Hum Genet
59: 983-989.
- Therneau TM, Grambsch PM, Fleming TR (1990): Martingale-based residuals
for survival models. Biometrika 77: 147-160.
- Visscher PM, Hopper JL (2001): Power of regression and maximum likelihood
methods to map QTL from sib-pair and DZ twin data. Ann Human Genet
65: 583-601.
- Weeks DE, Lange K (1988): The affected-pedigree-member method of linkage
analysis. Am J Hum Genet 42: 315-326.
- Ward PJ (1993): Some developments on the affected-pedigree-member method
of linkage analysis. Am J Hum Genet 52: 1200-1215.
- Ward PJ, Bonaiti-Pellie C (1995): Measuring gene-disease association
using a general pair method. Genet Epidemiol 12: 681-686.
- Whittemore AS, Halpern J (1994a): Probability of identity by descent:
computation and applications. Biometrics 50: 113-117.
- Whittemore AS, Halpern J (1994b): A class of tests for linkage using
affected pedigree members. Biometrics 50: 118-127.
- Young A (1995): Genetic Analysis System, version 2.0 [Computer program].
Oxford: Oxford University.
26-May-2003 (0.99.9)
Fixed infinite loop occasioned by last pedigree in file containing a
pedigree error.
24-April-2003 (0.99.9)
The "del" command now accepts a logical expression defining those
individuals whose data are to be deleted. The loci to be deleted can be
given as a list. The "famnum" and "index" variables now
contain the position of the family and that individual in the dataset.
17-April-2003 (0.99.9)
Prevented calculation of LD between X and autosomal markers.
10-April-2003 (0.99.9)
The "reg" command applied to a binary trait really does
give the logistic regession. The "combine" command combines rare
alleles at a marker into a single new allele.
09-April-2003 (0.99.9)
The "reg" command applied to a binary trait gives the logistic
regression.
28-March-2003 (0.99.9)
Added "read locus linkage" to read locus information from a Linkage
style .dat file.
20-March-2003 (0.99.9)
Recognise "/" as separating alleles in a pedigree file.
28-February-2003 (0.99.9)
Write "mainparams" and data file for Jonathan Pritchard's
structure program ("wri str <datfil>", "wri
loc str <locfil> <datfil>"). X chromosome TDT and
allele frequencies, export locus files.
12-February-2003 (0.99.9)
Fixed bug in ordering of loci for "wri lin" (caused by
introduction of X-linked marker class).
31-January-2003 (0.99.9)
Various changes to allow Mendelian error checking for X-linked markers.
Defined "xmarker" as a class of variable.
23-January-2003 (0.99.9)
Fixed "tab", "dav", and "des" (for binary
traits) so that they respect the new method of selection. Pedigrees are now
marked as active or inactive, but not deleted -- need to "pack"
(or write) for that. Quantitative variable printing in "pri"
repaired.
22-January-2003 (0.99.9)
Fixed "edit" command fallout from addition of X-linked
markers. Added "cas" command to divide pedigrees into
unrelated cases and controls eg founders with information, or one child
of parents with no information etc. The command "var <trait>
ae" fits only the AE (and E) models.
17-January-2003 (0.99.9)
Genotypic association analysis option: "ass <trait> gen"
gets a table of genotype rather than allele counts. The "unselect"
command returns all pedigrees previously excluded by one or more
"select" commands back into the analysis. A new type of marker
"xmarker" added. Still sorting out the Mendelian error checking
for this.
04-October-2002 (0.99.9)
Minimum intermarker map distance for Genehunter locus files fixed -- set
to 0.01 cM.
18-October-2002 (0.99.9)
One too many recombination distances being written to LINKAGE locus
files when a dummy locus specified. Thanks to David Evans for pointing
that out.
17-October-2002 (0.99.9)
Improved nuclear family mendelian checker (back to former level!).
Documented the "edit", "set errordrop", "set
checking" and "delete" commands (as well as
"prop" and "pchisq"). These shortcomings pointed
out by David Evans. Added "wri map merlin" command. A
"xlinked" flag added to the "write locus linkage"
command.
4-October-2002 (0.99.9)
If "set err_drop 2", then long-distance errors cause the entire
pedigree to be set to missing for that marker.
25-September-2002 (0.99.9)
After the "nuc gra" command, a parent with missing parents could
come after a nonfounder parent in the work file, causing a segfault on
subsequent analysis.
18-September-2002 (0.99.9)
The "select pedigree" command allows inclusion or exclusion of
specific named pedigrees from further analysis. The "gener"
command now only lists summary information at the default print level. The
"write pap" was writing MCMC start marker genotypes to the PAP
phen.dat as if they were observed genotypes -- Sandra Hasstedt helped
sort this out. Added a "write map" command, currently only for
MENDEL map files. Changed default tdt to "both" -- both parents
must be present.
29-July-2002 (0.99.9)
The "prune" command reduces a pedigree to the probands and a
minimum number of connecting relatives. The "ancestor"
with increased print level gives number of affected descendants for all
individuals. Some code reorganisation.
01-July-2002 (0.99.9)
The "rank" command writes the ranks for a variable.
26-June-2002 (0.99.9)
Results of "and" and "or" operations with missing
values are now more consistent (eg F && x = F). Bug in
"dis <marker>" form of call to "dis" fixed
(alleles at that marker were scrambled by the exact test). A
"factor <marker> <trait>;" command allows genotypes
to be coded as an quantitative variable (value 1...Ngenotypes).
20-June-2002 (0.99.9)
Can force calculation of Kruskal-Wallis test for RxC table
using "kru <quantitative trait> <factor>".
19-June-2002 (0.99.9)
Can now print out values for selected pedigree members tested for by a
conditional expression. The "tab" command extended to
tables of arbitrary dimension.
12-June-2002 (0.99.9)
MCMC based "exact" test for LD added. Visscher and Hopper
test repaired -- the squared trait sums were not centred (obviously,
only affected unstandardised data). Peter Visscher pointed out and helped
fix this.
28-May-2002 (0.99.9)
Added "numtyp", "alltyp" and "anytyp"
automatic variables that report how many markers an individual is typed
at. Cleaned up syntax for "des", "reg" to allow
ranges of loci.
17-May-2002 (0.99.9)
Added Visscher & Hopper's version of Haseman-Elston
("vis"). The "gener" command can now write the
generation number to a quantitative variable. Check and repair
troublesome locus names, eg duplicates or reserved words.
17-Apr-2002 (0.99.9)
If "0" was used as a missing value for a binary trait, this
was recognised as such: except for algebra. Now automatically recoded
when read in. Thanks to Jacki Wicks for pointing this out.
The "read linkage" command did not have this problem.
05-Apr-2002 (0.99.9)
Locus names as heading when write pedigree to screen. Documented and
improved symmetry (skewness) test.
13-Mar-2002 (0.99.9)
A divide-by-zero error and internal read error (on some compilers) fixed.
Thanks to Alexa Sorant for those reports.
12-Mar-2002 (0.99.9)
One last parser problem evaluating conditional expressions where the
switch statement contains a missing value. When a variable is missing
and is part of an expression other than equal or not equal
("==" and "^="), the expression evaluates to
missing. The "if" statement was incorrectly carrying out the
"then" branch rather than returning an error.
06-Mar-2002 (0.99.9)
Added F statistics to "assoc" command (assumes the binary
trait indicates membership of a subpopulation). Cleaned up output of
"assoc " command when no eligible individuals genotyped.
19-Feb-2002 (0.99.9)
Fixed bug is "istyp" and "untyp" command: these did not
evaluate correctly when starting values for the genotypes were not imputed
(ie "set imp -1" was used).
18-Feb-2002 (0.99.9)
Stopped segfault in RC-TDT when trait missing for any siblings. Genotypic
TDT simulation not done for completely missing data.
15-Feb-2002 (0.99.9)
GH and linkage locus file map distances now written to 2 and 4 decimal
places respectively.
31-Jan-2002 (0.99.9)
Sham and Purcell Haseman-Elston now has slope scaled and intercept fixed
so that slope is estimated QTL genetic variance (as proportion of total).
Also added keywords to "sib" to allow specification of the
population trait mean, variance and sib correlation, so that selected samples
can be analysed. Thanks to Anastasia Iliadou for prompting these changes.
23-Jan-2002 (0.99.9)
Write out control file for Loki's prep (pedigree preparation) program.
18-Jan-2002 (0.99.9)
Allowed estimation of dominance variance component in "var"
command. Added QTLs to "sml" command.
11-Jan-2002 (0.99.9)
Added variance components analysis including QTL linkage analysis for
full sibs.
3-Jan-2002 (0.99.9)
Fixed bug for RC-TDT under Windows , thanks to Lyle Palmer (program attempted
to print contents of an unassigned variable).
29-Nov-2001 (0.99.9)
Added MC P-value for global RC-TDT.
27-Nov-2001 (0.99.9)
Fixed parent-of-origin TDT where parents and child all same genotype --
thanks to Emiko Noguchi for pointing this out. RC-TDT "allowed"
to work when no unaffected children in dataset.
26-Nov-2001 (0.99.9)
Implemented the Reconstructed parents-Combined Transmission
Disequilibrium Test (RC-TDT) of Knapp [1999], available through the
assoc command, where it replaces the old sibship permutation
test. This is essentially identical to the default test for binary
data provided by the FBAT program of Xu, Horvath and Laird [2001].
16-Nov-2001 (0.99.9)
Uninformative sibships (all children same genotype) now stopped from
diluting the sibship permutation test. Calculated genotypes are ordered
by allele size.
15-Nov-2001 (0.99.9)
Reallow mother to precede father in list of parents of an individual.
Pedigree errors now cause that family to be deleted (and an error message
generated), rather than halting the program. Users should check a new line
of the summary output eg:
Number of pedigree errors = 1
Number of deleted records = 4
02-Nov-2001 (0.99.9)
Several versions of how to handle missing data in expressions,
especially where these recode data. At present, can test equality or
inequality of a variable with missing (test missingness), but other
operations evaluate to missing. Therefore, the expression "not
male" is true when sex is missing, but "sex<2"
is missing. Arithmetic expressions resulting in -9999.0 (the
internal missing value code) now give -9999.0001.
11-Sep-2001 (0.99.9)
Fixed bug in evaluation of "<=", ">="
"==" "^=": precedence in complex expressions was
not always correct. Added CPG chi-square to schaid output and
loglinear modelling to the service subroutines (uses IRLS).
Zeroing of genotypes when error dropping on changed so always deletes
all genotypes for that nuclear family rather than attempting to guess
where the error is (deleting child that gave rise to inconsistency).
04-Sep-2001 (0.99.9)
Fixed minor bug in qtdt() when zero subjects in a group: zero-trapped
log routine use extended.
21-Aug-2001 (0.99.9)
Added "<=", ">=" "==" "^="
as synonyms for the logical comparisons.
17-Aug-2001 (0.99.9)
Better headings for "sib" output when plevel equal to 1.
Matching on locus names now explicitly only on first 10 letters (an annoying
feature was that declaring a locus name longer than 10 characters led to the
name being truncated, but a "keep" command would use the full
string). Result from "tab" fixed for case where only one level of
a binary trait is present (eg affected only). Results of logical equality or
inequality with missing now give true or false rather than missing.
14-Aug-2001 (0.99.9)
Added "count" and "linkage" commands. The latter
follows Elston and Keats' approach (as seen in Sibpal!) to sib pair
linkage of codominant markers, and is interesting (and likely to be
replaced with something else). Fixed bug affecting "ne"
keyword (code implementing had been deleted somehow!).
18-Jul-2001 (0.99.9)
Fixed bug in code implementing new Sham and Purcell Haseman-Elston.
17-Jul-2001 (0.99.9)
Made "else if" work. NB: you may not nest if statements, as in:
if b eq 1 then (if c eq 1 then d=1 else d=2) else d=3
The legal equivalent is:
if (b eq 1 and c eq 1) then d=1 else if (b eq 1 and c ne 1) then d=2 else
d=3
or
if b ne 1 then d=3 else if c eq 1 then d=1 else d=2.
14-Jul-2001 (0.99.9)
Quoting blanks in some algebraic expressions no longer crashes the program eg
"+" "".
10-Jul-2001 (0.99.9)
Nicer output from "tab", including results for a single variable
and correct printing of genotypes.
6-Jul-2001 (0.99.9)
Added extra front ends to regress() so can access predicted values (for
imputation) and multiple regression residuals: "predict"
"residuals" and "impute". Fixed segfault occurring
when "nuclear" met families of size 1.
5-Jul-2001 (0.99.9)
The output from "mix" now includes the Filliben correlation as a
test for normality. The command "his" is a synonym for
"mix 1".
28-Jun-2001 (0.99.9)
Adds "untyp()" and "round()" functions.
Expressions involving genotypes evaluated for both alleles where
appropriate, for example:
# If genotype missing, replace with new genotype allowing for different
# binning
if untyp D1S124 then D1S124=D1S124_2-1
20-Jun-2001 (0.99.9)
Fixed bug in famcor() where families of size 1 contributed to the count of
matings. Quoting refined -- a quotation mark starts a new word, regardless
of location. Added attributable risk to schaid procedure.
20-Jun-2001 (0.99.9)
Implemented improved combined Haseman-Elston regression of Sham &
Purcell [2001]. Added quoting so special characters can be part of a
variable name in mathematical and logical expressions.
19-Jun-2001 (0.99.9)
Rejinked selection criteria for TDT probands (1/2 x 1/2 -> 1/2
trios now contribute again) so that genotypic TDT mimics CPG GRR test
(same expectations, uses unconditional chi-square as test statistic, but
simulation reproduces conditional likelihood). Reintroduced "set wei
imp" which counts the imputed and observed founder genotypes -- this
is less accurate in small datasets (eg made up of nuclear families) than
the unweighted or weighted versions.
18-Jun-2001 (0.99.9)
Fixed minor bug in writing MENDEL files: if zero had been used as a
missing value for a binary trait locus in the original pedigree file,
but the "read linkage" command not used, then this
would be replaced by the last locus for the previous person when writing
the MENDEL pedigree file. Added the Schaid and Sommer genotypic risk
ratio test (HWE but not CPG, as latter overlaps the TDT).
14-Jun-2001 (0.99.9)
Fixed minor bug parsing negative arguments to commands (eg "set ple
-1" was read as "set ple - 1"). Precedence of exponentiation
lowered so "int(4.2)^2" gives 16 and not 17! And the size of
evaluable expressions increased.
8-Jun-2001 (0.99.9)
Adds "inht" (inverse hyperbolic tan a.k.a. Fisher-Z transformation).
6-Jun-2001 (0.99.9)
The operator "<" was acting as ">" - corrected. Distributed DOS,
Windows32/NT and Linux binaries are now compressed using UPX.
24-May-2001 (0.99.9)
Added more functionality to parser so allows (one level of) if-then-else
construct, and creation and calculation of new variables. This extended to
the "select" statement to allow arbitrary selection criteria to be
specified. The "write" command with no arguments now writes to the
screen. Quantitative variables can now be written up to 20 columns wide.
26-Apr-2001 (0.99.9)
If plevel is set to -1, Mendelian errors cause a list of possibly involved
genotypes to be printed, rather than a pedigree drawing.
12-Mar-2001 (0.99.9)
Added "write dot" to produce drawing files for the dot graph drawing
package (this does nice marriage node pedigree drawings). The "davie"
command now adds the overall sibling recurrence risk. Default output
from "apm", "asp", "ass", "hwe", "sib" and "tdt" is now a
summary table with one line of output per test. The output print level
must be increased by one to get the old output (slightly rearranged).
Genehunter type pedigree files can be produced by "wri gh file
[dummy]".
03-Oct-2000 (0.99.9)
The option "write arl file par" writes haplotypes from
two genotyped parents per pedigree for Arlequin.
26-Sep-2000 (0.99.9)
Added "write arl" to produce haplotype files for Arlequin. Haplotypes from
one child with genotyped parents per pedigree are output. The "hwe" table
now includes genotype frequencies as well as counts.
20-Sep-2000 (0.99.9)
For PAP, families containing one member are now correctly recorded in
trip.dat (as a parent of a dummy child, along with a a dummy
spouse). The "grr" command calculates recurrence risks for a single
major locus model parameterized in terms of prevalence and penetrance
ratios.
26-Jul-2000 (0.99.9)
Fixed stupid error in "sib" command where half-sibs with one missing
trait value were sometimes included in the analysis with trait value
"-9999". Did not affect full-sib H-E regression. The "ibs" command
writes out mean IBS sharing for all pairs, as "ibd" writes mean IBDs.
14-Jul-2000 (0.99.9)
Cleaned up bug in writing popln.dat, which expects only 5 allele
frequencies per line. Now prints coefficient of fraternity when "kin
pairwise" is used. Command "dis" estimates two-locus haplotype
frequencies using independent or nearly independent informative matings.
23-Jun-2000 (0.99.9)
Fixed the obligatory equivalence problem, this time affecting "xta".
Changed name to "tab". Added "sib-pair" as a locus file type.
15-Jun-2000 (0.99.9)
Added "all" as possible person ID for "edit" and "delete" command,
allowing entire pedigree to be zeroed at a particular marker or phenotype.
13-Jun-2000 (0.99.9)
Corrected the P-values for the "he1 " command. This was giving the lower
tail probability, rather than the upper tail probability.
08-Jun-2000 (0.99.9)
Pedigree and locus files produced for Genehunter 2 now have the
loci automatically sorted. Added "xta" for RxC contingency tables for
traits.
28-Feb-2000 (0.99.9)
Fixed stupid error introduced when fixing last one: caused segmentation fault
in qtdt().
25-Feb-2000 (0.99.9)
Fixed stupid error in the quantitative trait "TDT" (conditional on
parental genotypes allelic ANOVA) -- the founder genotypes and
phenotypes were not transferred to the work arrays. Linkage locus file
now always has N-1 thetas (occasionally would write N, where N is the
number of loci).
15-Feb-2000 (0.99.9)
Another equivalence problem fixed, in assoc(), was overwriting allele counts
for table (ANOVA results were OK).
14-Feb-2000 (0.99.9)
Minor tweaks to code for allele frequencies. Where all subjects in the
dataset are untyped at a marker, they are given a starting genotype of "1/1".
09-Feb-2000 (0.99.9)
Minor changes: if print level is set to 2, tdt also prints out the
transmitted and nontransmitted alleles for each informative proband. Use
of the he1 command gets the old squared-difference Haseman-Elston
regression. A quantitative trait "TDT" added to the output from
assoc.
20-Jan-2000 (0.99.9)
Replaced marginal genotypic TDT with simpler one conditioning on parental
genotypes. Simulated P-value is now based on typed ancestors of probands (at
least both parents). A memory error affecting the IBD matrix for the entire
pedigree (wribd) is repaired.
08-Dec-1999 (0.99.8)
Minor bug fixes: greatly increased speed of MC generation of starting
genotypes by replacing an inefficient loop.
26-Nov-1999 (0.99.7)
Minor bug fixes - was printing 2*N for the number of genotypes in the
pedigree file.
12-Nov-1999 (0.99.7)
Added "gh2" option to write linkage: this adds a dummy trait locus.
and writes missing quantitative traits as "-". When added to write locus
linkage, it adds the dummy locus and writes the inter-locus distances as
centimorgans rather than as recombination fractions. If set tdt first
is used, only one proband (both parents typed) in every family is used.
29-Sep-1999 (0.99.7)
Added sibship permutation test for binary trait allelic association analysis.
Fixed small bug in ibd estimation for nuclear families.
16-Sep-1999 (0.99.7)
Multilocus IBS sharing calculated for all relative pairs by the share
command. This is an unweighted statistic. The expectation and sampling
variance are generated by gene-dropping, allowing for the (sex-averaged)
linkage map. The results of this may be difficult to interpret -- for
example, I surmise they will detect marry-ins from a different ethnic
background.
03-Sep-1999 (0.99.7)
Output from Mendelian error checking enhanced slightly -- lists the
odd-allele-out in phenoset of an untyped parent causing an inconsistency.
26-Aug-1999 (0.99.6)
Fixed bugs in domix() -- reading wrong data column, and dohist() -- histogram
had spurious extra bars.
23-Aug-1999 (0.99.5)
Finally altered sib pair ibd estimation algorithm to use information from all
(full-sib) offspring when one or both parents untyped. Half-sib algorithm
unchanged. Need to similarly swap over twopoint commands. Fixed bugs
in nuclear() -- writing last sibship in each pedigree twice.
12-Aug-1999 (0.99.4)
Minor bugfixes in writing pedigree files (no whitespace between quantitative
traits. Added "grandparents" option to nuclear command, so can add if
phase information wanted.
05-Jul-1999 (0.99.3)
Adds experimental haplotyping routine (recmin based) and procedure to
pinpoint the "lowest common ancestor" of multiple affecteds in a pedigree.
The mean marital IBS sharing is now tested versus expectation in the
hwe command, and the homozygosity test can also now be applied easily to
all typed individuals. Mixtures of normals etc can be fitted to quantitative
traits, as can multiple regression analysis. Various Gconvert pedigree and
locus file writing routines moved to Sib-pair.
27-Jan-1999 (0.98.9)
Further small bugs removed (mainly printing IDs). The asp command
output includes mean IBD sharing for full-sibs, along with the "mean" test
P-value (exact binomial two-tailed).
25-Jan-1999 (0.98.8)
Removes two bugs introduced in 0.98.7: one lost every first member of a
pedigree (save the first pedigree).
21-Jan-1999 (0.98.7)
This version contains a number of minor improvements, mainly in output from
the genotyping error checking routines. Dummy records (and if necessary ID)
are now generated for missing parents of nonfounders (that is, where only one
parent is specified, or a parent ID is specified but a record for that person
was not included). This was previously performed by the auxiliary awk
program addpar.awk. Individual IDs can now be alphanumeric. Errors
occurring when MAXSIZ, the maximum pedigree size, was increased above
800 have been fixed (these arose from an overflow in the sort key).
28-Aug-1998 (0.98.3)
Fixed bug in algorithm for producing generation numbers: this was adding
extra generations when loops were encountered (the pedigree was still
correctly sorted in that parents always preceded children. Further tinkering
with the MCMC algorithm. This is still not fully correct, as it needs a
correct specification of the proposal distribution for the new fsimped()
algorithm.
14-Aug-1998 (0.98.2)
Further refinement of MCMC algorithm -- much better handling of multiple
marriages by simply replacing the algorithm of genof4() with that used by
genof3() -- the latter is used to generate starting genotypes.
21-Jul-1998 (0.97.9)
Implemented "include" command, so commands can be read from an external file.
Added ability to select on pedigree size to "select", and to trim nuclear
pedigrees to a set size.
2-Jul-1998 (0.97.7)
In Gconvert, fixed problem with "wri loc tcl", which was writing cM, not M.
There is an undocumented (save here!) "write locus mim", which writes a
header in the appropriate format which can be concatenated onto a Linkage
format file. The "keep", "drop" and "undelete" commands now accept ranges eg
"drop D1S1 to D1S10".
26-Jun-1998 (0.97.6)
Cleaned up the gener() algorithm so that it correctly assigns generation
number when a nominal single pedigree is in fact made up of multiple disjoint
pedigrees. The "gener" command now prints out each such subpedigree in turn
(if present), and the output is now more compact and easier to read.
16-Jun-1998 (0.97.5)
This adds routines that print out the mean ibd sharing at a marker locus for
all pairs of relatives in a pedigree ("ibd"), as well as ("kin") the expected
sharing given the degree of relationship (coefficient of relationship), or
the inbreeding coefficient. There is also a utility for calculating
recurrence risks under SML models ("sml"), and a command ("select") for
selecting pedigrees for later analysis based on a trait value of one of its
members (very basic). The "write pedigree" command now writes quantitative
traits as F9.4, and the martingale residuals output by the "kaplan" command
have been changed to those of Therneau et al [1990].
25-May-1998 (0.97.3)
Mainly bug repairs. Both Gconvert and Sib-pair were incorrectly recoding
sexes after the "nuclear" procedure (due to change of sex from logical to
integer in most but not this routine!). A backslash "\" as the last word on a
line now means the next line is a continuation line. The connect() routine in
Gconvert now lists all families contained within a disjoint pedigree (ie
unrelated individuals with that pedigree ID are present in the pedigree
file). Now all printed thetas are positive.
01-Apr-1998 (0.97.2)
Minor bug repairs. Labelling of paternal and maternal genotypes in output
from wrgtp() -- list of nuclear family genotypes when inconsistency detected
-- was reversed. Stacking of jobs made easier by retaining work and data
paths after "clear" issued. Gconvert upgraded to include functions such as
"clear", "set data".
13-Mar-1998 (0.97.0)
This version has further changes made to the MCMC algorithm for simulating
missing genotypes. The proposals made using fsimped() were not in fact being
tested via the Metropolis criterion, as they were being stored in array set()
and not array set2(). Again, it seems these changes make little difference to
the estimates of ibd in the test pedigrees, although they are
noticeable in the joint missing genotype distributions.
03-Mar-1998
Changes the residuals output by the "kap" procedure to the Nelson-Aalen
"martingale residuals" described by Commenge from "survivor residuals" based
on the product-limit estimator.
02-Mar-1998 (0.96.5)
This adds association analysis for a quantitative trait, implemented as a
permutation test ANOVA. A new command "kap" gives the Kaplan-Meier estimate
of the survivor function for a binary trait with variable age of onset. The
residuals from this analysis can be saved for Haseman-Elston linkage and
association analysis.
The Monte-Carlo Markov Chain algorithm for simulating missing genotypes
has been further tinkered with, following problems encountered in
multigeneration pedigrees where multiple consecutive generations are untyped.
I have reintroduced a "switch of origins" operation for heterozygote children
of untyped x untyped matings, as the existing "mutation" operation will
shuffle such parental genotypes only when the child has no offspring (a fact
that had eluded me until now). This seems to fix the trouble.
Reanalysis of the example datasets included with Sib-pair found little
difference in ibd based statistics compared to the old algorithms.
Using the Lange-Goradia algorithm for producing starting genotypes for MCMC
can still fail, as described in the original 1987 paper, when loops are
present. If this occurs, Sib-pair now switches to its alternative
gene-dropping based algorithm.
11-Feb-1998 (0.96.0)
The internal (and outputted) sorting of pedigree members is now by
(founder/nonfounder) (generation) (father ID) (mother ID) (ID). This fixes a
problem in some pedigrees (thanks to Hank Juo) where starting genotypes for
the MCMC algorithms could not be generated via the Lange-Goradia approach. It
did not affect the other MC-based approach used when imputation was "off".
Since the sort key is currently integer*4, the family size in this new
program is limited to approximately 1290.
A "set burn-in" option has been added to allow specification of the number
of iterations of the MCMC algorithms to be run prior to the iterations used
to calculate statistics. Previously, this was set to zero. In the empirical
tests I had done, the Lange-Goradia algorithm generated starting genotypes
seemed to give unbiased results, and so I had removed the burn-in. More
recently, I have found example families (with missing genotypes) where
estimated ibd is biased upwards. This bias is reduced by a suitable
number of burn-in iterations.
28-Jan-1998 (0.95.6)
Generate workfile names using time as seed, to avoid clashes on multitasking
systems. The "sta" command added. Further work on allpair: "all <tra>
wpc" calculates an experimental variant of the randomization
weighted-pairwise-statistic of Commenges (see Genet Epidemiol 1997;14:971-4).
Further prettification of Gconvert output.
23-Jan-1998 (0.95.5)
Some prettification of Gconvert output.
13-Jan-1998
Added a "set map" command to Gconvert so that the ASPEX and LINKAGE locus
files can have a sex-averaged linkage map specified. Maps can also be
specified by "set dist" to give intermarker map distances, and by adding a
map position at the end of the "set loc" line for a marker.
12-Jan-1998 (0.95.4)
Added ASPEX locus file and GDA pedigree file output to Gconvert. ASPEX reads
command files in TCL containing the marker names and intermarker
recombination distances (set to 0.50 Morgans by Gconvert). Analysis is set to
be of a Linkage format pedigree file containing one binary trait and all the
available marker loci. GDA reads an extension of the Nexus format. Gconvert
will only write out founders unless told otherwise, as GDA is designed for
population genetic work. The likelihood ratios for the IBD based ASP analysis
are now corrected back (!) to be twice their former values. The HWE
chi-square has been changed to a LR chi-square, to make it more robust in
sparse tables. Individual IDs are now written without leading zeroes in most
cases, so person 112-00000001 is now written 112-1. A "time" command will
give the time elapsed since the program started, or since "time" was last
called.
11-Dec-1997
The error in the loop accumulating grandparent-grandchild pairs was present
for the parent-offspring pair test as well.
10-Dec-1997 (0.95.3)
Family correlations and recurrence risks now include all pairs regardless of
id numbering (the old version would miss grandparent-grandchild pairs where
the grandchild ID number was higher than that of the grandparent AND the
grandparent was a nonfounder). The "fre[quency]" command now has a synonym
"des[criptive]", and can be applied to a single quantitative trait.
Previously, the correlations, sibship variance test etc were only available
via a global "fre" command. The "und[elete]" command now has a default of
returning all deleted loci back to the analysis.
02-Dec-1997
Reformatted apm output (list of affecteds and unaffecteds now after family
summary statistics when plevel=1). Ascertainment corrected segregation ratios
and standard errors default to those for complete ascertainment (reducing to
those of Li & Mantel) if no variable corresponding to proband status is
defined (same as "davie trait trait").
26-Nov-1997
Now trims long locus names (>10 characters) when parsing "drop", "keep"
and "undelete" commands to correctly match. Does not check for names
different at greater than 10th character.
25-Nov-1997 (0.95.2)
Repaired a bug in the "trans" command (boxcox()). If genotypes preceded a
quantitative phenotype in a pedigree record, the wrong column would be
transformed.
19-Nov-1997 (0.95.1)
Repaired annoying bug in Haseman-Elston regression analysis. Function for
calculating ibd-sharing was treating the starting genotypes used for MCMC as
if these were observed (therefore does not affect versions prior to 0.93).
Never a problem for Gconvert's wrsib command.
Elapsed time now measured by time(), rather than secnds() for DOS version.
A refinement only of interest for overnight runs.
Program now warns if there are fewer fields for a pedigree record than
expected -- previously these were silently padded out as missing. It still
silently ignores extra fields.
Memory utilisation decreased by equivalencing several work arrays.
13-Nov-1997 (0.95.0)
Allowable whitespace in pedigree and control files now includes tabs. Shell
commands are echoed as a comment (self documenting calls to other programs).
06-Nov-1997
Fixed bug where multiple drop/keep/undelete statements clashed. Removed
punctuation stripping from parser "();". These were originally present so
that GAS type control files could be used as a template without editing.
05-Nov-1997
Gconvert now writes an additional line to the dummy description in an
outputted LINKAGE style locus file (the variance multiplier), and sets
quantitative trait zero values to 0.0001, so they are not treated as missing
by LINKAGE.
24-Oct-1997
Summary of MC P-values for APM is now done using an inverse Z transform of
the P-value for each pedigree. fval() now reads "y" and "n" as 2.0 and 1.0,
making transformation/recoding more transparent.
25-Sep-1997
Repaired bug which made ibs affected half-sib pair chi-square incorrect for
sparse tables (few pairs), by changing from Pearson to LR chi-square.
Improved gener() algorithm so marry-in generation numbers correct in the
presence of loops. DOS executable now a compressed version (using DJP),
reducing the size of the file by half.
21-Aug-1997
Repaired bug which meant that the "generation" command only worked if it was
the first command after "run". Updated on-line help.
20-Aug-1997
Implemented routine to compare marker homozygosity in probands to that
expected based on the sample allele frequencies.
11-Aug-1997
Added keywords "mat" and "pat" to perform TDT analyses only on the
contributions of the mother or father of the proband.
06-Aug-1997
Updated Gconvert by adding help, setting all keywords to 3 letter minimum.
03-Aug-1997
Implemented exact two-tailed probabilities for single-allele TDT.
16-Jul-1997
Added routine for correcting segregation ratios for ascertainment.
15-Jul-1997
Added routine for testing HWE at all markers.
28-Jun-1997
Added "set tdt bot[h parents]|one [parent]" limiting TDT if requested to
cases where both parents typed. Using cases where one parent is untyped can
lead to bias, esp for diallelic markers with unequal allele frequencies.
Alpha version of allpairs() routine set working.
26-May-1997
Added IBS based ASP analysis to Sib-pair, and an IBS based test for checking
if full-sib pairs are not half-sib pairs (or unrelated!) to Gconvert.
06-Mar-1997
Prettified output of describe().
03-Mar-1997 (0.94)
Sib-pair writing out all simulated genotypes regardless of imputation level
to pedigree file. Fixed so now only writes all genotypes if imputation level
3. Added "read linkage" to read Linkage-style pedigree files without having
to recode zeroes to missing for quantitative traits. DJGPP V2 now used to
produce DOS executable.
05-Feb-1997
Fixed up problem with imputation level 3 in Gconvert. This problem long since
fixed in Sib-pair. Involved failure to correctly update genotype array in
sequential imputation if genotype became fixed as side effect of another
target.
Feb-1997
Trialled djgpp v2. Code seems to be slightly slower than v1 equivalent for
some examples, but faster in others. Prepared source code for release.
18-Dec-1996 (0.93)
Released Sib-pair with successful MCMC simulation of missing genotypes. Added
to the freq command to gives familial correlations and a version of the Fain
sibship variance test for a quantitative trait. Included generation command
to lists pedigree members by sibship and generation number. Included
transform command to transform a quantitative trait.
18-Oct-1996 (0.92)
Cleaned up several bugs involving recoding/downcoding of alleles [only met if
multiple recode statements involving the same marker], calculation of
P-values for very large values of chi-square statistics [evaluated
incorrectly as 1.0 or NaN in some cases], and error checking [would delete
pedigree files with errors in the parent field].
Sep-1996
Versions of Sib-pair for ASP analysis following Faraway (1992). GPM ibs
algorithms weight improved as Patrick Ward's program allowed calculation of
exact result for comparison.
Jul-1996
Changed MC/randomization routines to sequential approach of Besag.
Mar-1996
Various unreleased versions using different MCMC algorithms.
Aug-1995
Moved Lange-Goradia algorithm from using random-access file to entirely in
memory. Required moving from MS-Fortran 4.1 to f2c/djgpp for DOS version.
May-1995
Added estimation of sib-pair IBD sharing where one or two untyped parents.
Apr-1995
First code for multiallelic TDT tests. Imputation done only within nuclear
families. Input and output code based on earlier PHI program. GAS pedigree
file structure chosen as more readable than LINKAGE style (essentially
identical).